1
|
Kwan BPM, Lynch BM, Edbrooke L, Hodge A, Swain CTV. Are the Relationships of Physical Activity and Television Viewing Time With Mortality Robust to Confounding? A Study, Utilizing E-Values, From the Melbourne Collaborative Cohort Study. J Phys Act Health 2024; 21:1105-1113. [PMID: 39322218 DOI: 10.1123/jpah.2024-0218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 07/11/2024] [Accepted: 07/25/2024] [Indexed: 09/27/2024]
Abstract
BACKGROUND Physical activity and sedentary behavior are associated with health outcomes. However, evidence may be affected by confounding bias. This study aimed to examine the relationships of physical activity and television (TV) viewing time with all-cause, cardiovascular, and cancer mortality in a cohort of Australian adults, and determine the robustness of these relationships to residual and unmeasured confounding. METHODS Data from 27,317 Melbourne Collaborative Cohort Study participants (mean age = 66) were used. Physical activity was assessed using the International Physical Activity Questionnaire-Short Form and categorized as insufficient, sufficient, or more than sufficient. TV viewing time was categorized as low, moderate, or high. Multivariable Cox regression models were used to evaluate associations of interest. E-values were calculated to assess the strength of unmeasured confounders required to negate the observed results. RESULTS For highest versus lowest physical activity category, the hazard ratio was 0.67 (95% confidence interval, 0.56-0.81) for all-cause mortality; E-values ranged between 1.79 and 2.44. Results were similar for cardiovascular mortality; however, hazard ratios were lower (0.72; 95% confidence interval, 0.51-1.01) and E-values much smaller (1.00-2.12) for cancer mortality. For highest versus lowest TV viewing time category, the hazard ratio was 1.08 (1.01-1.15) for all-cause mortality; E-values ranged between 1.00 and 1.37. Results were similar for cardiovascular and cancer mortality. CONCLUSIONS Physical activity and TV viewing time were associated with mortality. The robustness to unmeasured/residual confounding was moderate for physical activity (all-cause and cardiovascular mortality), but weaker for physical activity (cancer mortality) and TV viewing time in this study of Australian adults.
Collapse
Affiliation(s)
- Baldwin Pok Man Kwan
- Melbourne School of Population and Global Health, The University of Melbourne, Carlton, VIC, Australia
| | - Brigid M Lynch
- Melbourne School of Population and Global Health, The University of Melbourne, Carlton, VIC, Australia
- Cancer Epidemiology Division, Cancer Council Victoria, East Melbourne, VIC, Australia
- Physical Activity Laboratory, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia
| | - Lara Edbrooke
- Department of Physiotherapy, Melbourne School of Health Sciences, The University of Melbourne, Carlton, VIC, Australia
- Department of Health Services Research, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
| | - Allison Hodge
- Melbourne School of Population and Global Health, The University of Melbourne, Carlton, VIC, Australia
- Cancer Epidemiology Division, Cancer Council Victoria, East Melbourne, VIC, Australia
| | - Christopher T V Swain
- Cancer Epidemiology Division, Cancer Council Victoria, East Melbourne, VIC, Australia
- Department of Physiotherapy, Melbourne School of Health Sciences, The University of Melbourne, Carlton, VIC, Australia
| |
Collapse
|
2
|
Shepherd DA, Amor DJ, Moreno-Betancur M. Statistical analysis of observational studies in disability research. Dev Med Child Neurol 2024; 66:1408-1418. [PMID: 38721699 DOI: 10.1111/dmcn.15948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 04/03/2024] [Accepted: 04/04/2024] [Indexed: 10/04/2024]
Abstract
Observational studies have a critical role in disability research, providing the opportunity to address a range of research questions. Over the past decades, there have been substantial shifts and developments in statistical methods for observational studies, most notably for causal inference. In this review, we provide an overview of modern design and analysis concepts critical for observational studies, drawing examples from the field of disability research and highlighting the challenges in this field, to inform the readership on important statistical considerations for their studies. WHAT THIS PAPER ADDS: Descriptive research questions have specific analytical complexities, so careful statistical design before analysis is critical. Prediction research aims to produce a model with good predictive ability and requires thorough statistical design prior to analysis. Causal research requires careful statistical analysis planning, facilitated by modern causal inference concepts and analytical methods. Adopting these approaches will strengthen the quality of observational studies addressing a range of research questions in the disability space.
Collapse
Affiliation(s)
- Daisy A Shepherd
- Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Neurodisability and Rehabilitation, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - David J Amor
- Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Neurodisability and Rehabilitation, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
- Neurodevelopment and Disability, Royal Children's Hospital, Melbourne, Victoria, Australia
| | - Margarita Moreno-Betancur
- Department of Paediatrics, University of Melbourne, Melbourne, Victoria, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| |
Collapse
|
3
|
Raman SR, Hammill BG, Shaw PA, Lee H, Toh S, Connolly JG, Dandreo KJ, Nalawade V, Tian F, Liu W, Li J, Hernández-Muñoz JJ, Glynn RJ, Desai RJ, Weberpals J. Analyzing missingness patterns in real-world data using the SMDI toolkit: application to a linked EHR-claims pharmacoepidemiology study. BMC Med Res Methodol 2024; 24:246. [PMID: 39427148 PMCID: PMC11490010 DOI: 10.1186/s12874-024-02330-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Accepted: 09/05/2024] [Indexed: 10/21/2024] Open
Abstract
BACKGROUND Missing data in confounding variables present a frequent challenge in generating evidence using real-world data, including electronic health records (EHR). Our objective was to apply a recently published toolkit for characterizing missing data patterns and based on the toolkit results about likely missingness mechanisms, illustrate the decision-making process for analyses in an empirical case example. METHODS We utilized the Structural Missing Data Investigations (SMDI) toolkit to characterize missing data patterns in the context of a pharmacoepidemiology study comparing cardiovascular outcomes of initiating sodium-glucose-cotransporter-2 inhibitors (SGLT2i) and dipeptidyl peptidase-4 inhibitors (DPP-4i) among older adults. The study used a linked EHR-Medicare claims dataset from Duke Health patients (2015-2017), focusing on partially observed confounders from EHR data (HbA1c lab and body mass index [BMI] values). Our analysis incorporated SMDI's descriptive functions and diagnostic tests to explore missingness patterns and determine missingness mitigation approaches. We used findings from these investigations to inform estimation of adjusted hazard ratios comparing the two classes of medications. RESULTS High levels of missingness were noted for important confounding variables including HbA1c (63.6%) and BMI (16.5%). Diagnostic tests resulted in output that described: 1) the distributions of patient characteristics, exposure, and outcome between patients with or without an observed value of the partially observed covariate, 2) the ability to predict missingness based on observed covariates, and 3) estimate if the missingness of a partially observed covariate is differential with respect to the outcome. There was evidence that missingness could be sufficiently described using observed data, which allowed multiple imputation by chained equations using random forests to address missing confounder data in estimating treatment effects. Multiple imputation resulted in improved alignment of effect estimates with previous studies. CONCLUSIONS We were able to demonstrate the practical application of the SMDI toolkit in a real-world setting. Application of the SMDI toolkit and the resulting insights of potential missingness patterns can inform the choice of appropriate analytic methods and increase transparency of research methods in handling missing data. This type of approach can inform analytic decision making and may increase our ability to generate evidence from real-world data.
Collapse
Affiliation(s)
- Sudha R Raman
- Department of Population Health Sciences, Duke University School of Medicine, Durham, USA.
| | - Bradley G Hammill
- Department of Population Health Sciences, Duke University School of Medicine, Durham, USA
| | - Pamela A Shaw
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, USA
| | - Hana Lee
- Office of Biostatistics, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, USA
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, USA
| | - John G Connolly
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, USA
| | - Kimberly J Dandreo
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, USA
| | - Vinit Nalawade
- Department of Population Health Sciences, Duke University School of Medicine, Durham, USA
| | - Fang Tian
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, USA
| | - Wei Liu
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, USA
| | - Jie Li
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, USA
| | - José J Hernández-Muñoz
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, USA
| | - Robert J Glynn
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - Rishi J Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| | - Janick Weberpals
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, USA
| |
Collapse
|
4
|
Mainzer RM, Moreno-Betancur M, Nguyen CD, Simpson JA, Carlin JB, Lee KJ. Gaps in the usage and reporting of multiple imputation for incomplete data: findings from a scoping review of observational studies addressing causal questions. BMC Med Res Methodol 2024; 24:193. [PMID: 39232661 PMCID: PMC11373423 DOI: 10.1186/s12874-024-02302-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 08/02/2024] [Indexed: 09/06/2024] Open
Abstract
BACKGROUND Missing data are common in observational studies and often occur in several of the variables required when estimating a causal effect, i.e. the exposure, outcome and/or variables used to control for confounding. Analyses involving multiple incomplete variables are not as straightforward as analyses with a single incomplete variable. For example, in the context of multivariable missingness, the standard missing data assumptions ("missing completely at random", "missing at random" [MAR], "missing not at random") are difficult to interpret and assess. It is not clear how the complexities that arise due to multivariable missingness are being addressed in practice. The aim of this study was to review how missing data are managed and reported in observational studies that use multiple imputation (MI) for causal effect estimation, with a particular focus on missing data summaries, missing data assumptions, primary and sensitivity analyses, and MI implementation. METHODS We searched five top general epidemiology journals for observational studies that aimed to answer a causal research question and used MI, published between January 2019 and December 2021. Article screening and data extraction were performed systematically. RESULTS Of the 130 studies included in this review, 108 (83%) derived an analysis sample by excluding individuals with missing data in specific variables (e.g., outcome) and 114 (88%) had multivariable missingness within the analysis sample. Forty-four (34%) studies provided a statement about missing data assumptions, 35 of which stated the MAR assumption, but only 11/44 (25%) studies provided a justification for these assumptions. The number of imputations, MI method and MI software were generally well-reported (71%, 75% and 88% of studies, respectively), while aspects of the imputation model specification were not clear for more than half of the studies. A secondary analysis that used a different approach to handle the missing data was conducted in 69/130 (53%) studies. Of these 69 studies, 68 (99%) lacked a clear justification for the secondary analysis. CONCLUSION Effort is needed to clarify the rationale for and improve the reporting of MI for estimation of causal effects from observational data. We encourage greater transparency in making and reporting analytical decisions related to missing data.
Collapse
Affiliation(s)
- Rheanna M Mainzer
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia.
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia.
| | - Margarita Moreno-Betancur
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Cattram D Nguyen
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Julie A Simpson
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, 3052, Australia
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, 3052, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, 3052, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, 3052, Australia
| |
Collapse
|
5
|
Marks IR, Doyle LW, Mainzer RM, Spittle AJ, Clark M, Boland RA, Anderson PJ, Cheong JL. Neurosensory, cognitive and academic outcomes at 8 years in children born 22-23 weeks' gestation compared with more mature births. Arch Dis Child Fetal Neonatal Ed 2024; 109:511-518. [PMID: 38395594 DOI: 10.1136/archdischild-2023-326277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Accepted: 01/09/2024] [Indexed: 02/25/2024]
Abstract
Despite providing intensive care to more infants born <24 weeks' gestation, data on school-age outcomes, critical for counselling and decision-making, are sparse. OBJECTIVE To compare major neurosensory, cognitive and academic impairment among school-aged children born extremely preterm at 22-23 weeks' gestation (EP22-23) with those born 24-25 weeks (EP24-25), 26-27 weeks (EP26-27) and term (≥37 weeks). DESIGN Three prospective longitudinal cohorts. SETTING Victoria, Australia. PARTICIPANTS All EP live births (22-27 weeks) and term-born controls born in 1991-1992, 1997 and 2005. MAIN OUTCOME MEASURES At 8 years, major neurosensory disability (any of moderate/severe cerebral palsy, IQ <-2 SD relative to controls, blindness or deafness), motor, cognitive and academic impairment, executive dysfunction and poor health utility. Risk ratios (RRs) and risk differences between EP22-23 (reference) and other gestational age groups were estimated using generalised linear models, adjusted for era of birth, social risk and multiple birth. RESULTS The risk of major neurosensory disability was higher for EP22-23 (n=21) than more mature groups (168 EP24-25; 312 EP26-27; 576 term), with increasing magnitude of difference as the gestation increased (adjusted RR (95% CI) compared with EP24-25: 1.39 (0.70 to 2.76), p=0.35; EP26-27: 1.85 (0.95 to 3.61), p=0.07; term: 13.9 (5.75 to 33.7), p<0.001). Similar trends were seen with other outcomes. Two-thirds of EP22-23 survivors were free of major neurosensory disability. CONCLUSIONS Although children born EP22-23 experienced higher rates of disability and impairment at 8 years than children born more maturely, many were free of major neurosensory disability. These data support providing active care to infants born EP22-23.
Collapse
Affiliation(s)
- India Rm Marks
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Lex W Doyle
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Newborn Research, Royal Women's Hospital, Melbourne, Victoria, Australia
- Department of Obstetrics and Gynaecology, University of Melbourne, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
| | - Rheanna M Mainzer
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Alicia J Spittle
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Physiotherapy, University of Melbourne, Parkville, Victoria, Australia
| | - Marissa Clark
- Department of Neonatology, Monash Medical Centre, Clayton, Victoria, Australia
| | - Rosemarie A Boland
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Obstetrics and Gynaecology, University of Melbourne, Parkville, Victoria, Australia
| | - Peter J Anderson
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Turner Institute for Brain and Mental Health & School of Psychological Sciences, Monash University, Clayton, Victoria, Australia
| | - Jeanie Ly Cheong
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Newborn Research, Royal Women's Hospital, Melbourne, Victoria, Australia
- Department of Obstetrics and Gynaecology, University of Melbourne, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
6
|
Dashti SG, Lee KJ, Simpson JA, White IR, Carlin JB, Moreno-Betancur M. Handling missing data when estimating causal effects with targeted maximum likelihood estimation. Am J Epidemiol 2024; 193:1019-1030. [PMID: 38400653 PMCID: PMC11228874 DOI: 10.1093/aje/kwae012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 02/04/2024] [Accepted: 02/20/2024] [Indexed: 02/25/2024] Open
Abstract
Targeted maximum likelihood estimation (TMLE) is increasingly used for doubly robust causal inference, but how missing data should be handled when using TMLE with data-adaptive approaches is unclear. Based on data (1992-1998) from the Victorian Adolescent Health Cohort Study, we conducted a simulation study to evaluate 8 missing-data methods in this context: complete-case analysis, extended TMLE incorporating an outcome-missingness model, the missing covariate missing indicator method, and 5 multiple imputation (MI) approaches using parametric or machine-learning models. We considered 6 scenarios that varied in terms of exposure/outcome generation models (presence of confounder-confounder interactions) and missingness mechanisms (whether outcome influenced missingness in other variables and presence of interaction/nonlinear terms in missingness models). Complete-case analysis and extended TMLE had small biases when outcome did not influence missingness in other variables. Parametric MI without interactions had large bias when exposure/outcome generation models included interactions. Parametric MI including interactions performed best in bias and variance reduction across all settings, except when missingness models included a nonlinear term. When choosing a method for handling missing data in the context of TMLE, researchers must consider the missingness mechanism and, for MI, compatibility with the analysis method. In many settings, a parametric MI approach that incorporates interactions and nonlinearities is expected to perform well.
Collapse
Affiliation(s)
- S Ghazaleh Dashti
- Corresponding author: S. Ghazaleh Dashti, Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Royal Children’s Hospital, 50 Flemington Road, Parkville, VIC 3052, Australia ()
| | | | | | | | | | | |
Collapse
|
7
|
Weberpals J, Raman SR, Shaw PA, Lee H, Russo M, Hammill BG, Toh S, Connolly JG, Dandreo KJ, Tian F, Liu W, Li J, Hernández-Muñoz JJ, Glynn RJ, Desai RJ. A Principled Approach to Characterize and Analyze Partially Observed Confounder Data from Electronic Health Records. Clin Epidemiol 2024; 16:329-343. [PMID: 38798915 PMCID: PMC11127690 DOI: 10.2147/clep.s436131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 04/09/2024] [Indexed: 05/29/2024] Open
Abstract
Objective Partially observed confounder data pose challenges to the statistical analysis of electronic health records (EHR) and systematic assessments of potentially underlying missingness mechanisms are lacking. We aimed to provide a principled approach to empirically characterize missing data processes and investigate performance of analytic methods. Methods Three empirical sub-cohorts of diabetic SGLT2 or DPP4-inhibitor initiators with complete information on HbA1c, BMI and smoking as confounders of interest (COI) formed the basis of data simulation under a plasmode framework. A true null treatment effect, including the COI in the outcome generation model, and four missingness mechanisms for the COI were simulated: completely at random (MCAR), at random (MAR), and two not at random (MNAR) mechanisms, where missingness was dependent on an unmeasured confounder and on the value of the COI itself. We evaluated the ability of three groups of diagnostics to differentiate between mechanisms: 1)-differences in characteristics between patients with or without the observed COI (using averaged standardized mean differences [ASMD]), 2)-predictive ability of the missingness indicator based on observed covariates, and 3)-association of the missingness indicator with the outcome. We then compared analytic methods including "complete case", inverse probability weighting, single and multiple imputation in their ability to recover true treatment effects. Results The diagnostics successfully identified characteristic patterns of simulated missingness mechanisms. For MAR, but not MCAR, the patient characteristics showed substantial differences (median ASMD 0.20 vs 0.05) and consequently, discrimination of the prediction models for missingness was also higher (0.59 vs 0.50). For MNAR, but not MAR or MCAR, missingness was significantly associated with the outcome even in models adjusting for other observed covariates. Comparing analytic methods, multiple imputation using a random forest algorithm resulted in the lowest root-mean-squared-error. Conclusion Principled diagnostics provided reliable insights into missingness mechanisms. When assumptions allow, multiple imputation with nonparametric models could help reduce bias.
Collapse
Affiliation(s)
- Janick Weberpals
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Sudha R Raman
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA
| | - Pamela A Shaw
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Hana Lee
- Office of Biostatistics, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Massimiliano Russo
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Bradley G Hammill
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC, USA
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - John G Connolly
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - Kimberly J Dandreo
- Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, MA, USA
| | - Fang Tian
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Wei Liu
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Jie Li
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - José J Hernández-Muñoz
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA
| | - Robert J Glynn
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| | - Rishi J Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|
8
|
Zamanian A, von Kleist H, Ciora OA, Piperno M, Lancho G, Ahmidi N. Analysis of Missingness Scenarios for Observational Health Data. J Pers Med 2024; 14:514. [PMID: 38793096 PMCID: PMC11122060 DOI: 10.3390/jpm14050514] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 04/29/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024] Open
Abstract
Despite the extensive literature on missing data theory and cautionary articles emphasizing the importance of realistic analysis for healthcare data, a critical gap persists in incorporating domain knowledge into the missing data methods. In this paper, we argue that the remedy is to identify the key scenarios that lead to data missingness and investigate their theoretical implications. Based on this proposal, we first introduce an analysis framework where we investigate how different observation agents, such as physicians, influence the data availability and then scrutinize each scenario with respect to the steps in the missing data analysis. We apply this framework to the case study of observational data in healthcare facilities. We identify ten fundamental missingness scenarios and show how they influence the identification step for missing data graphical models, inverse probability weighting estimation, and exponential tilting sensitivity analysis. To emphasize how domain-informed analysis can improve method reliability, we conduct simulation studies under the influence of various missingness scenarios. We compare the results of three common methods in medical data analysis: complete-case analysis, Missforest imputation, and inverse probability weighting estimation. The experiments are conducted for two objectives: variable mean estimation and classification accuracy. We advocate for our analysis approach as a reference for the observational health data analysis. Beyond that, we also posit that the proposed analysis framework is applicable to other medical domains.
Collapse
Affiliation(s)
- Alireza Zamanian
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, 85748 Munich, Germany;
- Fraunhofer Institute for Cognitive Systems IKS, 80686 Munich, Germany; (O.-A.C.); (M.P.); (G.L.); (N.A.)
| | - Henrik von Kleist
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, 85748 Munich, Germany;
- Institute of Computational Biology, Helmholtz Center Munich, 80939 Munich, Germany
| | - Octavia-Andreea Ciora
- Fraunhofer Institute for Cognitive Systems IKS, 80686 Munich, Germany; (O.-A.C.); (M.P.); (G.L.); (N.A.)
| | - Marta Piperno
- Fraunhofer Institute for Cognitive Systems IKS, 80686 Munich, Germany; (O.-A.C.); (M.P.); (G.L.); (N.A.)
| | - Gino Lancho
- Fraunhofer Institute for Cognitive Systems IKS, 80686 Munich, Germany; (O.-A.C.); (M.P.); (G.L.); (N.A.)
| | - Narges Ahmidi
- Fraunhofer Institute for Cognitive Systems IKS, 80686 Munich, Germany; (O.-A.C.); (M.P.); (G.L.); (N.A.)
| |
Collapse
|
9
|
D'Agostino McGowan L, Lotspeich SC, Hepler SA. The "Why" behind including "Y" in your imputation model. Stat Methods Med Res 2024:9622802241244608. [PMID: 38625810 DOI: 10.1177/09622802241244608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2024]
Abstract
Missing data is a common challenge when analyzing epidemiological data, and imputation is often used to address this issue. Here, we investigate the scenario where a covariate used in an analysis has missingness and will be imputed. There are recommendations to include the outcome from the analysis model in the imputation model for missing covariates, but it is not necessarily clear if this recommendation always holds and why this is sometimes true. We examine deterministic imputation (i.e. single imputation with fixed values) and stochastic imputation (i.e. single or multiple imputation with random values) methods and their implications for estimating the relationship between the imputed covariate and the outcome. We mathematically demonstrate that including the outcome variable in imputation models is not just a recommendation but a requirement to achieve unbiased results when using stochastic imputation methods. Moreover, we dispel common misconceptions about deterministic imputation models and demonstrate why the outcome should not be included in these models. This article aims to bridge the gap between imputation in theory and in practice, providing mathematical derivations to explain common statistical recommendations. We offer a better understanding of the considerations involved in imputing missing covariates and emphasize when it is necessary to include the outcome variable in the imputation model.
Collapse
Affiliation(s)
| | - Sarah C Lotspeich
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, NC, USA
| | - Staci A Hepler
- Department of Statistical Sciences, Wake Forest University, Winston-Salem, NC, USA
| |
Collapse
|
10
|
Zhang J, Dashti SG, Carlin JB, Lee KJ, Moreno-Betancur M. Recoverability and estimation of causal effects under typical multivariable missingness mechanisms. Biom J 2024; 66:e2200326. [PMID: 38637322 DOI: 10.1002/bimj.202200326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 09/02/2023] [Accepted: 11/11/2023] [Indexed: 04/20/2024]
Abstract
In the context of missing data, the identifiability or "recoverability" of the average causal effect (ACE) depends not only on the usual causal assumptions but also on missingness assumptions that can be depicted by adding variable-specific missingness indicators to causal diagrams, creating missingness directed acyclic graphs (m-DAGs). Previous research described canonical m-DAGs, representing typical multivariable missingness mechanisms in epidemiological studies, and examined mathematically the recoverability of the ACE in each case. However, this work assumed no effect modification and did not investigate methods for estimation across such scenarios. Here, we extend this research by determining the recoverability of the ACE in settings with effect modification and conducting a simulation study to evaluate the performance of widely used missing data methods when estimating the ACE using correctly specified g-computation. Methods assessed were complete case analysis (CCA) and various implementations of multiple imputation (MI) with varying degrees of compatibility with the outcome model used in g-computation. Simulations were based on an example from the Victorian Adolescent Health Cohort Study (VAHCS), where interest was in estimating the ACE of adolescent cannabis use on mental health in young adulthood. We found that the ACE is recoverable when no incomplete variable (exposure, outcome, or confounder) causes its own missingness, and nonrecoverable otherwise, in simplified versions of 10 canonical m-DAGs that excluded unmeasured common causes of missingness indicators. Despite this nonrecoverability, simulations showed that MI approaches that are compatible with the outcome model in g-computation may enable approximately unbiased estimation across all canonical m-DAGs considered, except when the outcome causes its own missingness or causes the missingness of a variable that causes its own missingness. In the latter settings, researchers may need to consider sensitivity analysis methods incorporating external information (e.g., delta-adjustment methods). The VAHCS case study illustrates the practical implications of these findings.
Collapse
Affiliation(s)
- Jiaxin Zhang
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Australia
| | - S Ghazaleh Dashti
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Australia
| | - Margarita Moreno-Betancur
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Australia
| |
Collapse
|
11
|
Weberpals J, Raman SR, Shaw PA, Lee H, Hammill BG, Toh S, Connolly JG, Dandreo KJ, Tian F, Liu W, Li J, Hernández-Muñoz JJ, Glynn RJ, Desai RJ. smdi: an R package to perform structural missing data investigations on partially observed confounders in real-world evidence studies. JAMIA Open 2024; 7:ooae008. [PMID: 38304248 PMCID: PMC10833461 DOI: 10.1093/jamiaopen/ooae008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/09/2024] [Accepted: 01/16/2024] [Indexed: 02/03/2024] Open
Abstract
Objectives Partially observed confounder data pose a major challenge in statistical analyses aimed to inform causal inference using electronic health records (EHRs). While analytic approaches such as imputation are available, assumptions on underlying missingness patterns and mechanisms must be verified. We aimed to develop a toolkit to streamline missing data diagnostics to guide choice of analytic approaches based on meeting necessary assumptions. Materials and methods We developed the smdi (structural missing data investigations) R package based on results of a previous simulation study which considered structural assumptions of common missing data mechanisms in EHR. Results smdi enables users to run principled missing data investigations on partially observed confounders and implement functions to visualize, describe, and infer potential missingness patterns and mechanisms based on observed data. Conclusions The smdi R package is freely available on CRAN and can provide valuable insights into underlying missingness patterns and mechanisms and thereby help improve the robustness of real-world evidence studies.
Collapse
Affiliation(s)
- Janick Weberpals
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| | - Sudha R Raman
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC 27701, United States
| | - Pamela A Shaw
- Biostatistics Division, Kaiser Permanente Washington Health Research Institute, Seattle, WA 98101, United States
| | - Hana Lee
- Office of Biostatistics, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| | - Bradley G Hammill
- Department of Population Health Sciences, Duke University School of Medicine, Durham, NC 27701, United States
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA 02215, United States
| | - John G Connolly
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA 02215, United States
| | - Kimberly J Dandreo
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA 02215, United States
| | - Fang Tian
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| | - Wei Liu
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| | - Jie Li
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| | - José J Hernández-Muñoz
- Office of Surveillance and Epidemiology, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD 20993, United States
| | - Robert J Glynn
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| | - Rishi J Desai
- Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02120, United States
| |
Collapse
|
12
|
Guo F, Langworthy B, Ogino S, Wang M. Comparison between inverse-probability weighting and multiple imputation in Cox model with missing failure subtype. Stat Methods Med Res 2024; 33:344-356. [PMID: 38262434 DOI: 10.1177/09622802231226328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2024]
Abstract
Identifying and distinguishing risk factors for heterogeneous disease subtypes has been of great interest. However, missingness in disease subtypes is a common problem in those data analyses. Several methods have been proposed to deal with the missing data, including complete-case analysis, inverse-probability weighting, and multiple imputation. Although extant literature has compared these methods in missing problems, none has focused on the competing risk setting. In this paper, we discuss the assumptions required when complete-case analysis, inverse-probability weighting, and multiple imputation are used to deal with the missing failure subtype problem, focusing on how to implement these methods under various realistic scenarios in competing risk settings. Besides, we compare these three methods regarding their biases, efficiency, and robustness to model misspecifications using simulation studies. Our results show that complete-case analysis can be seriously biased when the missing completely at random assumption does not hold. Inverse-probability weighting and multiple imputation estimators are valid when we correctly specify the corresponding models for missingness and for imputation, and multiple imputation typically shows higher efficiency than inverse-probability weighting. However, in real-world studies, building imputation models for the missing subtypes can be more challenging than building missingness models. In that case, inverse-probability weighting could be preferred for its easy usage. We also propose two automated model selection procedures and demonstrate their usage in a study of the association between smoking and colorectal cancer subtypes in the Nurses' Health Study and Health Professional Follow-Up Study.
Collapse
Affiliation(s)
- Fuyu Guo
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - Shuji Ogino
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Cancer Immunology and Cancer Epidemiology Programs, Dana-Farber Harvard Cancer Center, Boston, MA, USA
- Program in MPE Molecular Pathological Epidemiology, Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA,USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Molin Wang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, MA,USA
- Harvard Medical School, Boston, MA, USA
| |
Collapse
|
13
|
Oberman HI, Vink G. Toward a standardized evaluation of imputation methodology. Biom J 2024; 66:e2200107. [PMID: 36932050 DOI: 10.1002/bimj.202200107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 02/01/2023] [Accepted: 02/08/2023] [Indexed: 03/19/2023]
Abstract
Developing new imputation methodology has become a very active field. Unfortunately, there is no consensus on how to perform simulation studies to evaluate the properties of imputation methods. In part, this may be due to different aims between fields and studies. For example, when evaluating imputation techniques aimed at prediction, different aims may be formulated than when statistical inference is of interest. The lack of consensus may also stem from different personal preferences or scientific backgrounds. All in all, the lack of common ground in evaluating imputation methodology may lead to suboptimal use in practice. In this paper, we propose a move toward a standardized evaluation of imputation methodology. To demonstrate the need for standardization, we highlight a set of possible pitfalls that bring forth a chain of potential problems in the objective assessment of the performance of imputation routines. Additionally, we suggest a course of action for simulating and evaluating missing data problems. Our suggested course of action is by no means meant to serve as a complete cookbook, but rather meant to incite critical thinking and a move to objective and fair evaluations of imputation methodology. We invite the readers of this paper to contribute to the suggested course of action.
Collapse
Affiliation(s)
- Hanne I Oberman
- Departement of Methodology & Statistics, Utrecht, The Netherlands
| | - Gerko Vink
- Departement of Methodology & Statistics, Utrecht, The Netherlands
| |
Collapse
|
14
|
Mainzer RM, Nguyen CD, Carlin JB, Moreno‐Betancur M, White IR, Lee KJ. A comparison of strategies for selecting auxiliary variables for multiple imputation. Biom J 2024; 66:e2200291. [PMID: 38285405 PMCID: PMC7615727 DOI: 10.1002/bimj.202200291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 08/09/2023] [Accepted: 09/17/2023] [Indexed: 01/30/2024]
Abstract
Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include is not always straightforward. Several data-driven auxiliary variable selection strategies have been proposed, but there has been limited evaluation of their performance. Using a simulation study we evaluated the performance of eight auxiliary variable selection strategies: (1, 2) two versions of selection based on correlations in the observed data; (3) selection using hypothesis tests of the "missing completely at random" assumption; (4) replacing auxiliary variables with their principal components; (5, 6) forward and forward stepwise selection; (7) forward selection based on the estimated fraction of missing information; and (8) selection via the least absolute shrinkage and selection operator (LASSO). A complete case analysis and an MI analysis using all auxiliary variables (the "full model") were included for comparison. We also applied all strategies to a motivating case study. The full model outperformed all auxiliary variable selection strategies in the simulation study, with the LASSO strategy the best performing auxiliary variable selection strategy overall. All MI analysis strategies that we were able to apply to the case study led to similar estimates, although computational time was substantially reduced when variable selection was employed. This study provides further support for adopting an inclusive auxiliary variable strategy where possible. Auxiliary variable selection using the LASSO may be a promising alternative when the full model fails or is too burdensome.
Collapse
Affiliation(s)
- Rheanna M. Mainzer
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
- Department of PaediatricsThe University of MelbourneParkvilleVictoriaAustralia
| | - Cattram D. Nguyen
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
- Department of PaediatricsThe University of MelbourneParkvilleVictoriaAustralia
| | - John B. Carlin
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global HealthThe University of MelbourneParkvilleVictoriaAustralia
| | - Margarita Moreno‐Betancur
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
- Department of PaediatricsThe University of MelbourneParkvilleVictoriaAustralia
| | - Ian R. White
- MRC Clinical Trials UnitUniversity College LondonLondonUK
| | - Katherine J. Lee
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
- Department of PaediatricsThe University of MelbourneParkvilleVictoriaAustralia
| |
Collapse
|
15
|
Horvat-Gitsels LA, Cortina-Borja M, Rahi JS. Do adolescents with impaired vision have different intentions and ambitions for their education, career and social outcomes compared to their peers? Findings from the Millennium Cohort Study. Br J Ophthalmol 2023; 108:159-164. [PMID: 36307166 DOI: 10.1136/bjo-2021-320972] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 10/15/2022] [Indexed: 11/03/2022]
Abstract
BACKGROUND/AIMS To investigate if impaired vision adversely impacts the intentions/ambitions of adolescents concerning their future education, careers and social outcomes. METHODS Population-based birth cohort study in the UK comprising 9273 participants from the Millennium Cohort Study who were followed up to age 17 years. Children were classified as having normal vision or unilateral or bilateral impaired vision caused by significant eye conditions based on detailed parental-structured questionnaire data on sight problems and treatment coded by clinicians. Ten domains covering education, career and social outcomes by age 30 were investigated. RESULTS Adjusted regression models showed few differences by vision status. Bilateral impaired vision was associated with increased odds of intending to remain in full-time education after statutory school age (adjusted OR (aOR) 2.00, 95% CI 1.08 to 3.68) and of home ownership at age 30 (aOR 1.83, 95% CI 1.01 to 3.32). Impaired vision was not associated with intending to attend university. A significantly higher proportion of parents of children with bilateral or unilateral impaired vision thought that their child would not get the exam grades required to go to university than parents of those with normal vision (29% or 26% vs 16%, p=0.026). CONCLUSION Adolescents with impaired vision have broadly the same intentions/ambitions regarding future education, careers and social outcomes as their peers with normal vision. The known significant gaps in attainment in these domains among young adults with vision impairment are therefore likely to be due to barriers that they face in achieving their ambitions. Improved implementation of existing interventions is necessary to ensure equality of opportunities.
Collapse
Affiliation(s)
- Lisanne A Horvat-Gitsels
- Population, Policy and Practice Research and Teaching Department, Great Ormond Street Institute of Child Health, University College London, London, UK
- Ulverscroft Vision Research Group, Great Ormond Street Hospital for Children NHS Foundation Trust, University College London, London, UK
| | - Mario Cortina-Borja
- Population, Policy and Practice Research and Teaching Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Jugnoo S Rahi
- Population, Policy and Practice Research and Teaching Department, Great Ormond Street Institute of Child Health, University College London, London, UK
- Ulverscroft Vision Research Group, Great Ormond Street Hospital for Children NHS Foundation Trust, University College London, London, UK
- Ophthalmology Department, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
- Institute of Ophthalmology, University College London, London, UK
- NIHR Moorfields Biomedical Research Centre, London, UK
| |
Collapse
|
16
|
Selman C, Mainzer R, Lee K, Anderson P, Burnett A, Garland SM, Patton GC, Pigdon L, Roberts G, Wark J, Doyle LW, Cheong JLY. Health-related quality of life in adults born extremely preterm or with extremely low birth weight in the postsurfactant era: a longitudinal cohort study. Arch Dis Child Fetal Neonatal Ed 2023; 108:581-587. [PMID: 36997308 DOI: 10.1136/archdischild-2022-325230] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 03/20/2023] [Indexed: 04/01/2023]
Abstract
OBJECTIVES To compare health-related quality of life (HRQoL) at 25 and 18 years in individuals born extremely preterm (EP, <28 weeks' gestation) or with extremely low birth weight (ELBW, birth weight <1000 g) with term-born (≥37 weeks) controls. Within the EP/ELBW cohort, to determine whether HRQoL differed between those with lower and higher IQs. METHODS HRQoL was self-reported using the Health Utilities Index Mark 3 (HUI3) at 18 and 25 years by 297 EP/ELBW and 251 controls born in 1991-1992 in Victoria, Australia. Median differences (MDs) between groups were estimated using multiple imputation to handle missing data. RESULTS Adults born EP/ELBW had lower HRQoL (median utility 0.89) at 25 years than controls (median utility 0.93, MD -0.040), but with substantial uncertainty in the estimate (95% CI -0.088 to 0.008) and a smaller reduction at 18 years (MD -0.016, 95% CI -0.061 to 0.029). On individual HUI3 items, there was suboptimal performance on speech (OR 9.28, 95% CI 3.09 to 27.93) and dexterity (OR 5.44, 95% CI 1.04 to 28.45) in the EP/ELBW cohort. Within the EP/ELBW cohort, individuals with lower IQ had lower HRQoL compared with those with higher IQ at 25 (MD -0.031, 95% CI -0.126 to 0.064) and 18 years (MD -0.034, 95% CI -0.107 to 0.040), but again with substantial uncertainty in the estimates. CONCLUSIONS Compared with term-born controls, young adults born EP/ELBW reported poorer HRQoL, as did those with lower IQ compared with those with higher IQ in the EP/ELBW cohort. Given the uncertainties, our findings need corroboration.
Collapse
Affiliation(s)
- Christopher Selman
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Rheanna Mainzer
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Katherine Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
| | - Peter Anderson
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- School of Psychological Sciences, University of Melbourne, Parkville, Victoria, Australia
| | - Alice Burnett
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Premature Infant Follow-Up Program, Royal Women's Hospital, Melbourne, Victoria, Australia
| | - Suzanne M Garland
- Department of Obstetrics and Gynaecology, Royal Women's Hospital, Melbourne, Victoria, Australia
- Women's Centre for Infectious Diseases, Royal Women's Hospital, Melbourne, Victoria, Australia
- Infection and Immunity, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - George C Patton
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Centre for Adolescent Health, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Lauren Pigdon
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Newborn Research, Royal Women's Hospital, Parkville, Victoria, Australia
| | - Gehan Roberts
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Centre for Community Child Health, Royal Children's Hospital, Parkville, Victoria, Australia
- Population Health, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - John Wark
- Department of Medicine Royal Melbourne Hospital, The University of Melbourne, Parkville, Victoria, Australia
- Bone and Mineral Medicine, Department of Diabetes and Endocrinology, Royal Melbourne Hospital, Melbourne, Victoria, Australia
| | - Lex W Doyle
- Department of Paediatrics, University of Melbourne, Parkville, Victoria, Australia
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Premature Infant Follow-Up Program, Royal Women's Hospital, Melbourne, Victoria, Australia
- Department of Obstetrics and Gynaecology, Royal Women's Hospital, Melbourne, Victoria, Australia
| | - Jeanie Ling Yoong Cheong
- Clinical Sciences, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Obstetrics and Gynaecology, Royal Women's Hospital, Melbourne, Victoria, Australia
- Newborn Research, Royal Women's Hospital, Parkville, Victoria, Australia
| |
Collapse
|
17
|
Lee KJ, Carlin JB, Simpson JA, Moreno-Betancur M. Assumptions and analysis planning in studies with missing data in multiple variables: moving beyond the MCAR/MAR/MNAR classification. Int J Epidemiol 2023; 52:1268-1275. [PMID: 36779333 PMCID: PMC10396404 DOI: 10.1093/ije/dyad008] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Accepted: 01/24/2023] [Indexed: 02/14/2023] Open
Abstract
Researchers faced with incomplete data are encouraged to consider whether their data are 'missing completely at random' (MCAR), 'missing at random' (MAR) or 'missing not at random' (MNAR) when planning their analysis. However, there are two major problems with this classification as originally defined by Rubin in the 1970s. First, when there are missing data in multiple variables, the plausibility of the MAR assumption is difficult to assess using substantive knowledge and is more stringent than is generally appreciated. Second, although MCAR and MAR are sufficient conditions for consistent estimation with specific methods, they are not necessary conditions and therefore this categorization does not directly determine the best approach for handling the missing data in an analysis. How best to handle missing data depends on the assumed causal relationships between variables and their missingness, and what these relationships imply in terms of the 'recoverability' of the target estimand (the population parameter that encodes the answer to the underlying research question). Recoverability is defined as whether the estimand can be consistently estimated from the patterns and associations in the observed data without needing to invoke external information on the extent to which the distribution of missing values might differ from that of observed values. In this manuscript we outline an approach for deciding which method to use to handle multivariable missing data in an analysis, using directed acyclic graphs to depict missingness assumptions and determining the implications in terms of recoverability of the target estimand.
Collapse
Affiliation(s)
- Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Melbourne, Australia
- Department of Paediatrics, University of Melbourne, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Melbourne, Australia
- Department of Paediatrics, University of Melbourne, Australia
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia
| | - Julie A Simpson
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia
| | - Margarita Moreno-Betancur
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Melbourne, Australia
- Department of Paediatrics, University of Melbourne, Australia
| |
Collapse
|
18
|
Zhang J, Dashti SG, Carlin JB, Lee KJ, Moreno-Betancur M. Should multiple imputation be stratified by exposure group when estimating causal effects via outcome regression in observational studies? BMC Med Res Methodol 2023; 23:42. [PMID: 36797679 PMCID: PMC9933305 DOI: 10.1186/s12874-023-01843-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 01/16/2023] [Indexed: 02/18/2023] Open
Abstract
BACKGROUND Despite recent advances in causal inference methods, outcome regression remains the most widely used approach for estimating causal effects in epidemiological studies with a single-point exposure and outcome. Missing data are common in these studies, and complete-case analysis (CCA) and multiple imputation (MI) are two frequently used methods for handling them. In randomised controlled trials (RCTs), it has been shown that MI should be conducted separately by treatment group. In observational studies, causal inference is now understood as the task of emulating an RCT, which raises the question of whether MI should be conducted by exposure group in such studies. METHODS We addressed this question by evaluating the performance of seven methods for handling missing data when estimating causal effects with outcome regression. We conducted an extensive simulation study based on an illustrative case study from the Victorian Adolescent Health Cohort Study, assessing a range of scenarios, including seven outcome generation models with exposure-confounder interactions of differing strength. RESULTS The simulation results showed that MI by exposure group led to the least bias when the size of the smallest exposure group was relatively large, followed by MI approaches that included the exposure-confounder interactions. CONCLUSIONS The findings from our simulation study, which was designed based on a real case study, suggest that current practice for the conduct of MI in causal inference may need to shift to stratifying by exposure group where feasible, or otherwise including exposure-confounder interactions in the imputation model.
Collapse
Affiliation(s)
- Jiaxin Zhang
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia.
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, 50 Flemington Road, 3052, Parkville, Australia.
| | - S Ghazaleh Dashti
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, 50 Flemington Road, 3052, Parkville, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, 50 Flemington Road, 3052, Parkville, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, 50 Flemington Road, 3052, Parkville, Australia
| | - Margarita Moreno-Betancur
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, 50 Flemington Road, 3052, Parkville, Australia
| |
Collapse
|
19
|
Moreno-Betancur M, Lynch JW, Pilkington RM, Schuch HS, Gialamas A, Sawyer MG, Chittleborough CR, Schurer S, Gurrin LC. Emulating a target trial of intensive nurse home visiting in the policy-relevant population using linked administrative data. Int J Epidemiol 2023; 52:119-131. [PMID: 35588223 PMCID: PMC9908050 DOI: 10.1093/ije/dyac092] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 04/21/2022] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Populations willing to participate in randomized trials may not correspond well to policy-relevant target populations. Evidence of effectiveness that is complementary to randomized trials may be obtained by combining the 'target trial' causal inference framework with whole-of-population linked administrative data. METHODS We demonstrate this approach in an evaluation of the South Australian Family Home Visiting Program, a nurse home visiting programme targeting socially disadvantaged families. Using de-identified data from 2004-10 in the ethics-approved Better Evidence Better Outcomes Linked Data (BEBOLD) platform, we characterized the policy-relevant population and emulated a trial evaluating effects on child developmental vulnerability at 5 years (n = 4160) and academic achievement at 9 years (n = 6370). Linkage to seven health, welfare and education data sources allowed adjustment for 29 confounders using Targeted Maximum Likelihood Estimation (TMLE) with SuperLearner. Sensitivity analyses assessed robustness to analytical choices. RESULTS We demonstrated how the target trial framework may be used with linked administrative data to generate evidence for an intervention as it is delivered in practice in the community in the policy-relevant target population, and considering effects on outcomes years down the track. The target trial lens also aided in understanding and limiting the increased measurement, confounding and selection bias risks arising with such data. Substantively, we did not find robust evidence of a meaningful beneficial intervention effect. CONCLUSIONS This approach could be a valuable avenue for generating high-quality, policy-relevant evidence that is complementary to trials, particularly when the target populations are multiply disadvantaged and less likely to participate in trials.
Collapse
Affiliation(s)
- Margarita Moreno-Betancur
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children’s Research Institute, Parkville, VIC, Australia
- Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, VIC, Australia
| | - John W Lynch
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
- Bristol Medical School, Population Health Sciences, University of Bristol, Bristol, UK
| | - Rhiannon M Pilkington
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Helena S Schuch
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
- Postgraduate programme in Dentistry, Federal University of Pelotas, Pelotas, Brazil
| | - Angela Gialamas
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Michael G Sawyer
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
- School of Medicine, University of Adelaide, Adelaide, SA, Australia
| | - Catherine R Chittleborough
- School of Public Health, University of Adelaide, Adelaide, SA, Australia
- Robinson Research Institute, University of Adelaide, Adelaide, SA, Australia
| | - Stefanie Schurer
- School of Economics, University of Sydney, Sydney, NSW, Australia
| | - Lyle C Gurrin
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
20
|
Mainzer R, Moreno-Betancur M, Nguyen C, Simpson J, Carlin J, Lee K. Handling of missing data with multiple imputation in observational studies that address causal questions: protocol for a scoping review. BMJ Open 2023; 13:e065576. [PMID: 36725096 PMCID: PMC9896184 DOI: 10.1136/bmjopen-2022-065576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
INTRODUCTION Observational studies in health-related research often aim to answer causal questions. Missing data are common in these studies and often occur in multiple variables, such as the exposure, outcome and/or variables used to control for confounding. The standard classification of missing data as missing completely at random, missing at random (MAR) or missing not at random does not allow for a clear assessment of missingness assumptions when missingness arises in more than one variable. This presents challenges for selecting an analytic approach and determining when a sensitivity analysis under plausible alternative missing data assumptions is required. This is particularly pertinent with multiple imputation (MI), which is often justified by assuming data are MAR. The objective of this scoping review is to examine the use of MI in observational studies that address causal questions, with a focus on if and how (a) missingness assumptions are expressed and assessed, (b) missingness assumptions are used to justify the choice of a complete case analysis and/or MI for handling missing data and (c) sensitivity analyses under alternative plausible assumptions about the missingness mechanism are conducted. METHODS AND ANALYSIS We will review observational studies that aim to answer causal questions and use MI, published between January 2019 and December 2021 in five top general epidemiology journals. Studies will be identified using a full text search for the term 'multiple imputation' and then assessed for eligibility. Information extracted will include details about the study characteristics, missing data, missingness assumptions and MI implementation. Data will be summarised using descriptive statistics. ETHICS AND DISSEMINATION Ethics approval is not required for this review because data will be collected only from published studies. The results will be disseminated through a peer reviewed publication and conference presentations. TRIAL REGISTRATION NUMBER This protocol is registered on figshare (https://doi.org/10.6084/m9.figshare.20010497.v1).
Collapse
Affiliation(s)
- Rheanna Mainzer
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| | - Margarita Moreno-Betancur
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| | - Cattram Nguyen
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| | - Julie Simpson
- School of Population and Global Health, University of Melbourne, Parkville, Victoria, Australia
| | - John Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- School of Population and Global Health, University of Melbourne, Parkville, Victoria, Australia
| | - Katherine Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
- Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
21
|
Madley-Dowd P, Thomas R, Boyd A, Zammit S, Heron J, Rai D. Intellectual disability in the children of the Avon Longitudinal Study of Parents and Children (ALSPAC). Wellcome Open Res 2023; 7:172. [PMID: 37333842 PMCID: PMC10276197 DOI: 10.12688/wellcomeopenres.17803.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/06/2023] [Indexed: 08/03/2023] Open
Abstract
Background: Intellectual disability (ID) describes a neurodevelopmental condition involving impaired cognitive and functional ability. Here, we describe a multisource variable of ID using data from the Avon Longitudinal Study of Parents and Children (ALSPAC). Methods: The multisource indicator variable for ID was derived from i) IQ scores less than 70 measured at age 8 and at age 15, ii) free text fields from parent reported questionnaires, iii) school reported provision of educational services for individuals with a statement of special educational needs for cognitive impairments, iv) from relevant READ codes contained in GP records, iv) international classification of disease diagnoses contained in electronic hospital records and hospital episode statistics and v) recorded interactions with mental health services for ID contained within the mental health services data set. A case of ID was identified if two or more sources indicated ID. A second indicator, labelled as "probable ID", was created by relaxing the cut off in IQ scores to be less than 85. An indicator variable for known causes of ID was also created to aid in aetiological studies where ID with a known cause may need to be excluded. Results: 158 of 14,370 participants (1.10%) were indicated as having ID by two or more sources and 449 (3.12%) were indicated as having probable ID when the criteria for IQ scores was relaxed to less than 85. There were 476 participants (3.31%) with 1 or fewer sources of available information on ID; these participants had their multisource variable set to missing. The number of cases of ID with known cause was 31 (0.22% of the cohort, 19.6% of those with ID). Conclusions: The multisource variable of ID can be used in future analyses on ID in ALSPAC children.
Collapse
Affiliation(s)
- Paul Madley-Dowd
- Centre for Academic Mental Health, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Richard Thomas
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Andy Boyd
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Stanley Zammit
- Centre for Academic Mental Health, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- NIHR Biomedical Research Centre, University of Bristol, Bristol, BS8 2BN, UK
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, CF24 4HQ, UK
| | - Jon Heron
- Centre for Academic Mental Health, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
| | - Dheeraj Rai
- Centre for Academic Mental Health, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, BS8 2BN, UK
- NIHR Biomedical Research Centre, University of Bristol, Bristol, BS8 2BN, UK
- Avon and Wiltshire Partnership NHS Mental Health Trust, University of Bristol, Bristol, BA1 3QE, UK
| |
Collapse
|
22
|
Forster F, Heumann C, Schaub B, Böck A, Nowak D, Vogelberg C, Radon K. Parental occupational exposures prior to conception and offspring wheeze and eczema during first year of life. Ann Epidemiol 2023; 77:90-97. [PMID: 36476404 DOI: 10.1016/j.annepidem.2022.11.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 11/22/2022] [Accepted: 11/28/2022] [Indexed: 12/12/2022]
Abstract
PURPOSE Parental exposures prior to conception might influence asthma and allergy risk in offspring. As occupational exposures are established risk factors for asthma and allergies, we investigated if parental occupational exposures prior to conception cause wheeze and eczema in offspring during the first year of life. METHODS We analysed data of 436 families from an offspring cohort based on a follow-up study of German participants of the International Study of Asthma and Allergies in Childhood (ISAAC). Offspring cohort data was collected between 2009 and 2019. Occupational exposures were based on participants' work histories and measured by a Job-Exposure-Matrix. We used Bayesian logistic regression models for analysis. Inference and confounder selection were based on directed acyclic graphs. RESULTS In mothers, for both allergic and irritative occupational exposures prior to conception suggestive effects on offspring eczema during the first year of life were found (allergens: odds ratio (OR) 1.22, 95% compatibility interval (CI) 0.92-1.57; irritants: OR 1.36, 95% CI 0.99-1.77), while no relation with wheeze was suggested. CONCLUSIONS Our results suggest that reduction of asthma-related occupational exposures might not only reduce the burden of disease for occupationally induced or aggravated asthma and allergies in employees but also in their children.
Collapse
Affiliation(s)
- Felix Forster
- Institute and Clinic for Occupational, Social and Environmental Medicine, University Hospital, LMU Munich, Munich, Germany.
| | | | - Bianca Schaub
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany; Comprehensive Pneumology Center (CPC) Munich, German Center for Lung Research (DZL), Munich, Germany
| | - Andreas Böck
- Department of Pediatrics, Dr. von Hauner Children's Hospital, University Hospital, LMU Munich, Munich, Germany
| | - Dennis Nowak
- Institute and Clinic for Occupational, Social and Environmental Medicine, University Hospital, LMU Munich, Munich, Germany; Comprehensive Pneumology Center (CPC) Munich, German Center for Lung Research (DZL), Munich, Germany
| | - Christian Vogelberg
- Department of Pediatrics, University Hospital Dresden, Technical University, Dresden, Germany
| | - Katja Radon
- Institute and Clinic for Occupational, Social and Environmental Medicine, University Hospital, LMU Munich, Munich, Germany; Comprehensive Pneumology Center (CPC) Munich, German Center for Lung Research (DZL), Munich, Germany
| |
Collapse
|
23
|
Wijesuriya R, Moreno‐Betancur M, Carlin JB, De Silva AP, Lee KJ. Evaluation of approaches for accommodating interactions and non-linear terms in multiple imputation of incomplete three-level data. Biom J 2022; 64:1404-1425. [PMID: 34914127 PMCID: PMC10174217 DOI: 10.1002/bimj.202000343] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 04/19/2021] [Accepted: 06/05/2021] [Indexed: 12/14/2022]
Abstract
Three-level data structures arising from repeated measures on individuals clustered within larger units are common in health research studies. Missing data are prominent in such studies and are often handled via multiple imputation (MI). Although several MI approaches can be used to account for the three-level structure, including adaptations to single- and two-level approaches, when the substantive analysis model includes interactions or quadratic effects, these too need to be accommodated in the imputation model. In such analyses, substantive model compatible (SMC) MI has shown great promise in the context of single-level data. Although there have been recent developments in multilevel SMC MI, to date only one approach that explicitly handles incomplete three-level data is available. Alternatively, researchers can use pragmatic adaptations to single- and two-level MI approaches, or two-level SMC-MI approaches. We describe the available approaches and evaluate them via simulations in the context of three three-level random effects analysis models involving an interaction between the incomplete time-varying exposure and time, an interaction between the time-varying exposure and an incomplete time-fixed confounder, or a quadratic effect of the exposure. Results showed that all approaches considered performed well in terms of bias and precision when the target analysis involved an interaction with time, but the three-level SMC MI approach performed best when the target analysis involved an interaction between the time-varying exposure and an incomplete time-fixed confounder, or a quadratic effect of the exposure. We illustrate the methods using data from the Childhood to Adolescence Transition Study.
Collapse
Affiliation(s)
- Rushani Wijesuriya
- Department of PaediatricsFaculty of Medicine Dentistry and Health SciencesThe University of MelbourneParkvilleVictoriaAustralia
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
| | - Margarita Moreno‐Betancur
- Department of PaediatricsFaculty of Medicine Dentistry and Health SciencesThe University of MelbourneParkvilleVictoriaAustralia
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
| | - John B. Carlin
- Department of PaediatricsFaculty of Medicine Dentistry and Health SciencesThe University of MelbourneParkvilleVictoriaAustralia
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
- Centre for Epidemiology and BiostatisticsMelbourne School of Population and Global HealthUniversity of MelbourneMelbourneVictoriaAustralia
| | - Anurika P. De Silva
- Centre for Epidemiology and BiostatisticsMelbourne School of Population and Global HealthUniversity of MelbourneMelbourneVictoriaAustralia
| | - Katherine J. Lee
- Department of PaediatricsFaculty of Medicine Dentistry and Health SciencesThe University of MelbourneParkvilleVictoriaAustralia
- Clinical Epidemiology and Biostatistics UnitMurdoch Children's Research InstituteParkvilleVictoriaAustralia
| |
Collapse
|
24
|
Witte J, Foraita R, Didelez V. Multiple imputation and test-wise deletion for causal discovery with incomplete cohort data. Stat Med 2022; 41:4716-4743. [PMID: 35908775 DOI: 10.1002/sim.9535] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 06/12/2022] [Accepted: 07/11/2022] [Indexed: 11/08/2022]
Abstract
Causal discovery algorithms estimate causal graphs from observational data. This can provide a valuable complement to analyses focusing on the causal relation between individual treatment-outcome pairs. Constraint-based causal discovery algorithms rely on conditional independence testing when building the graph. Until recently, these algorithms have been unable to handle missing values. In this article, we investigate two alternative solutions: test-wise deletion and multiple imputation. We establish necessary and sufficient conditions for the recoverability of causal structures under test-wise deletion, and argue that multiple imputation is more challenging in the context of causal discovery than for estimation. We conduct an extensive comparison by simulating from benchmark causal graphs: as one might expect, we find that test-wise deletion and multiple imputation both clearly outperform list-wise deletion and single imputation. Crucially, our results further suggest that multiple imputation is especially useful in settings with a small number of either Gaussian or discrete variables, but when the dataset contains a mix of both neither method is uniformly best. The methods we compare include random forest imputation and a hybrid procedure combining test-wise deletion and multiple imputation. An application to data from the IDEFICS cohort study on diet- and lifestyle-related diseases in European children serves as an illustrating example.
Collapse
Affiliation(s)
- Janine Witte
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.,Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| | - Ronja Foraita
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Vanessa Didelez
- Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany.,Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| |
Collapse
|
25
|
Madley-Dowd P, Thomas R, Boyd A, Zammit S, Heron J, Rai D. Intellectual disability in the children of the Avon Longitudinal Study of Parents and Children (ALSPAC). Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.17803.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: Intellectual disability (ID) describes a neurodevelopmental condition involving impaired cognitive and functional ability. Here, we describe a multisource variable of ID using data from the Avon Longitudinal Study of Parents and Children (ALSPAC). Methods: The multisource indicator variable for ID was derived from i) IQ scores less than 70 measured at age 8 and at age 15, ii) free text fields from parent reported questionnaires, iii) school reported provision of educational services for individuals with a statement of special educational needs for cognitive impairments, iv) from relevant READ codes contained in GP records, iv) international classification of disease diagnoses contained in electronic hospital records and hospital episode statistics and v) recorded interactions with mental health services for ID contained within the mental health services data set. A case of ID was identified if two or more sources indicated ID. A second indicator, labelled as “probable ID”, was created by relaxing the cut off in IQ scores to be less than 85. An indicator variable for known causes of ID was also created to aid in aetiological studies where ID with a known cause may need to be excluded. Results: 158 of 14,370 participants (1.10%) were indicated as having ID by two or more sources and 449 (3.12%) were indicated as having probable ID when the criteria for IQ scores was relaxed to less than 85. There were 476 participants (3.31%) with 1 or fewer sources of available information on ID; these participants had their multisource variable set to missing. The number of cases of ID with known cause was 31 (0.22%). Conclusions: The multisource variable of ID can be used in future analyses on ID in ALSPAC children.
Collapse
|
26
|
Mainzer R, Apajee J, Nguyen CD, Carlin JB, Lee KJ. A comparison of multiple imputation strategies for handling missing data in multi-item scales: Guidance for longitudinal studies. Stat Med 2021; 40:4660-4674. [PMID: 34102709 DOI: 10.1002/sim.9088] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 04/20/2021] [Accepted: 05/25/2021] [Indexed: 01/28/2023]
Abstract
Medical research often involves using multi-item scales to assess individual characteristics, disease severity, and other health-related outcomes. It is common to observe missing data in the scale scores, due to missing data in one or more items that make up that score. Multiple imputation (MI) is a popular method for handling missing data. However, it is not clear how best to use MI in the context of scale scores, particularly when they are assessed at multiple waves of data collection resulting in large numbers of items. The aim of this article is to provide practical advice on how to impute missing values in a repeatedly measured multi-item scale using MI when inference on the scale score is of interest. We evaluated the performance of five MI strategies for imputing missing data at either the item or scale level using simulated data and a case study based on four waves of the Longitudinal Study of Australian Children (LSAC). MI was implemented using both multivariate normal imputation and fully conditional specification, with two rules for calculating the scale score. A complete case analysis was also performed for comparison. Based on our results, we caution against the use of a MI strategy that does not include the scale score in the imputation model(s) when the scale score is required for analysis.
Collapse
Affiliation(s)
- Rheanna Mainzer
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia
| | - Jemishabye Apajee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,Quality Use of Medicines and Pharmacy Research Centre, Clinical and Health Sciences, University of South Australia, Adelaide, South Australia, Australia
| | - Cattram D Nguyen
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, Victoria, Australia.,Department of Paediatrics, The University of Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
27
|
Lee KJ, Tilling KM, Cornish RP, Little RJA, Bell ML, Goetghebeur E, Hogan JW, Carpenter JR. Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework. J Clin Epidemiol 2021; 134:79-88. [PMID: 33539930 PMCID: PMC8168830 DOI: 10.1016/j.jclinepi.2021.01.008] [Citation(s) in RCA: 131] [Impact Index Per Article: 43.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 12/15/2020] [Accepted: 01/13/2021] [Indexed: 12/17/2022]
Abstract
Missing data are ubiquitous in medical research. Although there is increasing guidance on how to handle missing data, practice is changing slowly and misapprehensions abound, particularly in observational research. Importantly, the lack of transparency around methodological decisions is threatening the validity and reproducibility of modern research. We present a practical framework for handling and reporting the analysis of incomplete data in observational studies, which we illustrate using a case study from the Avon Longitudinal Study of Parents and Children. The framework consists of three steps: 1) Develop an analysis plan specifying the analysis model and how missing data are going to be addressed. An important consideration is whether a complete records' analysis is likely to be valid, whether multiple imputation or an alternative approach is likely to offer benefits and whether a sensitivity analysis regarding the missingness mechanism is required; 2) Examine the data, checking the methods outlined in the analysis plan are appropriate, and conduct the preplanned analysis; and 3) Report the results, including a description of the missing data, details on how the missing data were addressed, and the results from all analyses, interpreted in light of the missing data and the clinical relevance. This framework seeks to support researchers in thinking systematically about missing data and transparently reporting the potential effect on the study results, therefore increasing the confidence in and reproducibility of research findings.
Collapse
Affiliation(s)
- Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Melbourne, Australia; Department of Paediatrics, University of Melbourne, Melbourne, Australia.
| | - Kate M Tilling
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | - Rosie P Cornish
- MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
| | | | - Melanie L Bell
- Department of Epidemiology and Biostatistics, University of Arizona, AZ, USA
| | - Els Goetghebeur
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
| | | | - James R Carpenter
- MRC Clinical Trials Unit, London School of Hygiene and Tropical Medicine, London, UK
| |
Collapse
|
28
|
Ross RK, Breskin A, Westreich D. When Is a Complete-Case Approach to Missing Data Valid? The Importance of Effect-Measure Modification. Am J Epidemiol 2020; 189:1583-1589. [PMID: 32601706 DOI: 10.1093/aje/kwaa124] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Revised: 06/22/2020] [Accepted: 06/23/2020] [Indexed: 12/19/2022] Open
Abstract
When estimating causal effects, careful handling of missing data is needed to avoid bias. Complete-case analysis is commonly used in epidemiologic analyses. Previous work has shown that covariate-stratified effect estimates from complete-case analysis are unbiased when missingness is independent of the outcome conditional on the exposure and covariates. Here, we assess the bias of complete-case analysis for adjusted marginal effects when confounding is present under various causal structures of missing data. We show that estimation of the marginal risk difference requires an unbiased estimate of the unconditional joint distribution of confounders and any other covariates required for conditional independence of missingness and outcome. The dependence of missing data on these covariates must be considered to obtain a valid estimate of the covariate distribution. If none of these covariates are effect-measure modifiers on the absolute scale, however, the marginal risk difference will equal the stratified risk differences and the complete-case analysis will be unbiased when the stratified effect estimates are unbiased. Estimation of unbiased marginal effects in complete-case analysis therefore requires close consideration of causal structure and effect-measure modification.
Collapse
|
29
|
Wijesuriya R, Moreno-Betancur M, Carlin JB, Lee KJ. Evaluation of approaches for multiple imputation of three-level data. BMC Med Res Methodol 2020; 20:207. [PMID: 32787781 PMCID: PMC7422505 DOI: 10.1186/s12874-020-01079-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 07/12/2020] [Indexed: 12/30/2022] Open
Abstract
Background Three-level data arising from repeated measures on individuals who are clustered within larger units are common in health research studies. Missing data are prominent in such longitudinal studies and multiple imputation (MI) is a popular approach for handling missing data. Extensions of joint modelling and fully conditional specification MI approaches based on multilevel models have been developed for imputing three-level data. Alternatively, it is possible to extend single- and two-level MI methods to impute three-level data using dummy indicators and/or by analysing repeated measures in wide format. However, most implementations, evaluations and applications of these approaches focus on the context of incomplete two-level data. It is currently unclear which approach is preferable for imputing three-level data. Methods In this study, we investigated the performance of various MI methods for imputing three-level incomplete data when the target analysis model is a three-level random effects model with a random intercept for each level. The MI methods were evaluated via simulations and illustrated using empirical data, based on a case study from the Childhood to Adolescence Transition Study, a longitudinal cohort collecting repeated measures on students who were clustered within schools. In our simulations we considered a number of different scenarios covering a range of different missing data mechanisms, missing data proportions and strengths of level-2 and level-3 intra-cluster correlations. Results We found that all of the approaches considered produced valid inferences about both the regression coefficient corresponding to the exposure of interest and the variance components under the various scenarios within the simulation study. In the case study, all approaches led to similar results. Conclusion Researchers may use extensions to the single- and two-level approaches, or the three-level approaches, to adequately handle incomplete three-level data. The two-level MI approaches with dummy indicator extension or the MI approaches based on three-level models will be required in certain circumstances such as when there are longitudinal data measured at irregular time intervals. However, the single- and two-level approaches with the DI extension should be used with caution as the DI approach has been shown to produce biased parameter estimates in certain scenarios.
Collapse
Affiliation(s)
- Rushani Wijesuriya
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, VIC, 3052, Australia. .,Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, VIC, 3052, Australia.
| | - Margarita Moreno-Betancur
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, VIC, 3052, Australia.,Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, VIC, 3052, Australia
| | - John B Carlin
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, VIC, 3052, Australia.,Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, VIC, 3052, Australia
| | - Katherine J Lee
- Clinical Epidemiology and Biostatistics Unit, Murdoch Children's Research Institute, Parkville, VIC, 3052, Australia.,Clinical Epidemiology and Biostatistics Unit, Department of Paediatrics, University of Melbourne, Parkville, VIC, 3052, Australia
| |
Collapse
|
30
|
Smith LH. Selection Mechanisms and Their Consequences: Understanding and Addressing Selection Bias. CURR EPIDEMIOL REP 2020. [DOI: 10.1007/s40471-020-00241-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
31
|
Abstract
Graphical models are useful tools in causal inference, and causal directed acyclic graphs (DAGs) are used extensively to determine the variables for which it is sufficient to control for confounding to estimate causal effects. We discuss the following ten pitfalls and tips that are easily overlooked when using DAGs: 1) Each node on DAGs corresponds to a random variable and not its realized values; 2) The presence or absence of arrows in DAGs corresponds to the presence or absence of individual causal effect in the population; 3) “Non-manipulable” variables and their arrows should be drawn with care; 4) It is preferable to draw DAGs for the total population, rather than for the exposed or unexposed groups; 5) DAGs are primarily useful to examine the presence of confounding in distribution in the notion of confounding in expectation; 6) Although DAGs provide qualitative differences of causal structures, they cannot describe details of how to adjust for confounding; 7) DAGs can be used to illustrate the consequences of matching and the appropriate handling of matched variables in cohort and case-control studies; 8) When explicitly accounting for temporal order in DAGs, it is necessary to use separate nodes for each timing; 9) In certain cases, DAGs with signed edges can be used in drawing conclusions about the direction of bias; and 10) DAGs can be (and should be) used to describe not only confounding bias but also other forms of bias. We also discuss recent developments of graphical models and their future directions.
Collapse
Affiliation(s)
- Etsuji Suzuki
- Department of Epidemiology, Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University
| | - Tomohiro Shinozaki
- Department of Information and Computer Technology, Faculty of Engineering, Tokyo University of Science
| | | |
Collapse
|
32
|
Schomaker M, Kühne F, Siebert U. RE: "EFFECT ESTIMATES IN RANDOMIZED TRIALS AND OBSERVATIONAL STUDIES: COMPARING APPLES WITH APPLES". Am J Epidemiol 2020; 189:77-78. [PMID: 31529036 DOI: 10.1093/aje/kwz194] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Revised: 06/21/2018] [Accepted: 07/04/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Michael Schomaker
- Institute of Public Health, Medical Decision Making and Health Technology Assessment, Department of Public Health, Health Services Research, and Health Technology Assessment, UMIT—University for Health Sciences, Medical Informatics and Technology, Hall in Tirol, Austria
- Centre for Infectious Disease Epidemiology and Research, University of Cape Town, Cape Town, South Africa
| | - Felicitas Kühne
- Institute of Public Health, Medical Decision Making and Health Technology Assessment, Department of Public Health, Health Services Research, and Health Technology Assessment, UMIT—University for Health Sciences, Medical Informatics and Technology, Hall in Tirol, Austria
| | - Uwe Siebert
- Institute of Public Health, Medical Decision Making and Health Technology Assessment, Department of Public Health, Health Services Research, and Health Technology Assessment, UMIT—University for Health Sciences, Medical Informatics and Technology, Hall in Tirol, Austria
- Centre for Infectious Disease Epidemiology and Research, University of Cape Town, Cape Town, South Africa
- Center for Health Decision Science, Department of Health Policy and Management, T.H. Chan School of Public Health, Harvard University, Boston, MA
- Program on Cardiovascular Research, Institute for Technology Assessment and Department of Radiology, Massachusetts General Hospital, Harvard Medical School, Boston, MA
- Division of Health Technology Assessment and Bioinformatics, ONCOTYROL—Center for Personalized Cancer Medicine, Innsbruck, Austria
| |
Collapse
|
33
|
Vandormael A, Tanser F, Cuadros D, Dobra A. Estimating trends in the incidence rate with interval censored data and time-dependent covariates. Stat Methods Med Res 2019; 29:272-281. [PMID: 30782096 DOI: 10.1177/0962280219829892] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
We propose a multiple imputation method for estimating the incidence rate with interval censored data and time-dependent (and/or time-independent) covariates. The method has two stages. First, we use a semi-parametric G-transformation model to estimate the cumulative baseline hazard function and the effects of the time-dependent (and/or time-independent covariates) on the interval censored infection times. Second, we derive the participant's unique cumulative distribution function and impute infection times conditional on the covariate values. To assess performance, we simulated infection times from a Cox proportional hazards model and induced interval censoring by varying the testing rate, e.g., participants test 100%, 75%, 50% of the time, etc. We then compared the incidence rate estimates from our G-imputation approach with single random-point and mid-point imputation. By comparison, our G-imputation approach gave more accurate incidence rate estimates and appropriate standard errors for models with time-independent covariates only, time-dependent covariates only, and a mixture of time-dependent and time-independent covariates across various testing rates. We demonstrate, for the first time, a multiple imputation approach for incidence rate estimation with interval censored data and time-dependent (and/or time-independent) covariates.
Collapse
Affiliation(s)
- Alain Vandormael
- School of Nursing and Public Health, University of KwaZulu-Natal, KwaZulu-Natal, South Africa.,Africa Health Research Institute (AHRI), KwaZulu-Natal, South Africa.,KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), University of KwaZulu-Natal, KwaZulu-Natal, South Africa
| | - Frank Tanser
- School of Nursing and Public Health, University of KwaZulu-Natal, KwaZulu-Natal, South Africa.,Africa Health Research Institute (AHRI), KwaZulu-Natal, South Africa.,Centre for the AIDS Programme of Research in South Africa (CAPRISA), University of KwaZulu-Natal, KwaZulu-Natal, South Africa.,Research Department of Infection & Population Health, University College London, London, UK
| | - Diego Cuadros
- Department of Geography and Geographic Information Science, University of Cincinnati, Cincinnati, OH, USA
| | - Adrian Dobra
- Department of Statistics, Center for Statistics and the Social Sciences, and Center for Studies in Demography and Ecology, University of Washington, Seattle, WA, USA
| |
Collapse
|