1
|
Shin YE, Saegusa T. Nested case-control sampling without replacement. LIFETIME DATA ANALYSIS 2024:10.1007/s10985-024-09633-y. [PMID: 39235702 DOI: 10.1007/s10985-024-09633-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 08/09/2024] [Indexed: 09/06/2024]
Abstract
Nested case-control design (NCC) is a cost-effective outcome-dependent design in epidemiology that collects all cases and a fixed number of controls at the time of case diagnosis from a large cohort. Due to inefficiency relative to full cohort studies, previous research developed various estimation methodologies but changing designs in the formulation of risk sets was considered only in view of potential bias in the partial likelihood estimation. In this paper, we study a modified design that excludes previously selected controls from risk sets in view of efficiency improvement as well as bias. To this end, we extend the inverse probability weighting method of Samuelsen which was shown to outperform the partial likelihood estimator in the standard setting. We develop its asymptotic theory and a variance estimation of both regression coefficients and the cumulative baseline hazard function that takes account of the complex feature of the modified sampling design. In addition to good finite sample performance of variance estimation, simulation studies show that the modified design with the proposed estimator is more efficient than the standard design. Examples are provided using data from NIH-AARP Diet and Health Cohort Study.
Collapse
Affiliation(s)
| | - Takumi Saegusa
- University of Maryland, College Park, Maryland, United States
| |
Collapse
|
2
|
Etievant L, Gail MH. Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data. LIFETIME DATA ANALYSIS 2024; 30:572-599. [PMID: 38565754 PMCID: PMC11420370 DOI: 10.1007/s10985-024-09621-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Accepted: 01/30/2024] [Indexed: 04/04/2024]
Abstract
The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log-relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used "robust" variance estimate of Barlow (Biometrics 50:1064-1072, 1994) is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log-relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
Collapse
Affiliation(s)
- Lola Etievant
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| | - Mitchell H Gail
- Division of Cancer Epidemiology and Genetics, Biostatistics Branch, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD, 20850-9780, USA.
| |
Collapse
|
3
|
Middleton M, Nguyen C, Carlin JB, Moreno-Betancur M, Lee KJ. On the use of multiple imputation to address data missing by design as well as unintended missing data in case-cohort studies with a binary endpoint. BMC Med Res Methodol 2023; 23:287. [PMID: 38062377 PMCID: PMC10702035 DOI: 10.1186/s12874-023-02090-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 11/02/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Case-cohort studies are conducted within cohort studies, with the defining feature that collection of exposure data is limited to a subset of the cohort, leading to a large proportion of missing data by design. Standard analysis uses inverse probability weighting (IPW) to address this intended missing data, but little research has been conducted into how best to perform analysis when there is also unintended missingness. Multiple imputation (MI) has become a default standard for handling unintended missingness and is typically used in combination with IPW to handle the intended missingness due to the case-control sampling. Alternatively, MI could be used to handle both the intended and unintended missingness. While the performance of an MI-only approach has been investigated in the context of a case-cohort study with a time-to-event outcome, it is unclear how this approach performs with a binary outcome. METHODS We conducted a simulation study to assess and compare the performance of approaches using only MI, only IPW, and a combination of MI and IPW, for handling intended and unintended missingness in the case-cohort setting. We also applied the approaches to a case study. RESULTS Our results show that the combined approach is approximately unbiased for estimation of the exposure effect when the sample size is large, and was the least biased with small sample sizes, while MI-only and IPW-only exhibited larger biases in both sample size settings. CONCLUSIONS These findings suggest that a combined MI/IPW approach should be preferred to handle intended and unintended missing data in case-cohort studies with binary outcomes.
Collapse
Affiliation(s)
- Melissa Middleton
- Clinical Epidemiology & Biostatistics Unit, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Australia.
- Department of Paediatrics, The University of Melbourne, 50 Flemington Rd, Parkville, VIC, 3052, Australia.
| | - Cattram Nguyen
- Clinical Epidemiology & Biostatistics Unit, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Australia
- Department of Paediatrics, The University of Melbourne, 50 Flemington Rd, Parkville, VIC, 3052, Australia
| | - John B Carlin
- Clinical Epidemiology & Biostatistics Unit, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Australia
- Department of Paediatrics, The University of Melbourne, 50 Flemington Rd, Parkville, VIC, 3052, Australia
| | - Margarita Moreno-Betancur
- Clinical Epidemiology & Biostatistics Unit, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Australia
- Department of Paediatrics, The University of Melbourne, 50 Flemington Rd, Parkville, VIC, 3052, Australia
| | - Katherine J Lee
- Clinical Epidemiology & Biostatistics Unit, Murdoch Children's Research Institute, Royal Children's Hospital, Melbourne, Australia
- Department of Paediatrics, The University of Melbourne, 50 Flemington Rd, Parkville, VIC, 3052, Australia
| |
Collapse
|
4
|
Zhong W, Diao G. Joint semiparametric models for case-cohort designs. Biometrics 2023; 79:1959-1971. [PMID: 35917392 DOI: 10.1111/biom.13728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 07/20/2022] [Indexed: 11/28/2022]
Abstract
Two-phase studies such as case-cohort and nested case-control studies are widely used cost-effective sampling strategies. In the first phase, the observed failure/censoring time and inexpensive exposures are collected. In the second phase, a subgroup of subjects is selected for measurements of expensive exposures based on the information from the first phase. One challenging issue is how to utilize all the available information to conduct efficient regression analyses of the two-phase study data. This paper proposes a joint semiparametric modeling of the survival outcome and the expensive exposures. Specifically, we assume a class of semiparametric transformation models and a semiparametric density ratio model for the survival outcome and the expensive exposures, respectively. The class of semiparametric transformation models includes the proportional hazards model and the proportional odds model as special cases. The density ratio model is flexible in modeling multivariate mixed-type data. We develop efficient likelihood-based estimation and inference procedures and establish the large sample properties of the nonparametric maximum likelihood estimators. Extensive numerical studies reveal that the proposed methods perform well under practical settings. The proposed methods also appear to be reasonably robust under various model mis-specifications. An application to the National Wilms Tumor Study is provided.
Collapse
Affiliation(s)
- Weibin Zhong
- Global Biometrics & Data Sciences, Bristol Myers Squibb, Berkeley Heights, New Jersey, USA
| | - Guoqing Diao
- Department of Biostatistics and Bioinformatics, The George Washington University, Washington, District of Columbia, USA
| |
Collapse
|
5
|
He JR, Hirst JE, Tikellis G, Phillips GS, Ramakrishnan R, Paltiel O, Ponsonby AL, Klebanoff M, Olsen J, Murphy MFG, Håberg SE, Lemeshow S, F Olsen S, Qiu X, Magnus P, Golding J, Ward MH, Wiemels JL, Rahimi K, Linet MS, Dwyer T. Common maternal infections during pregnancy and childhood leukaemia in the offspring: findings from six international birth cohorts. Int J Epidemiol 2022; 51:769-777. [PMID: 34519790 PMCID: PMC9425514 DOI: 10.1093/ije/dyab199] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Previous epidemiological studies have found positive associations between maternal infections and childhood leukaemia; however, evidence from prospective cohort studies is scarce. We aimed to examine the associations using large-scale prospective data. METHODS Data were pooled from six population-based birth cohorts in Australia, Denmark, Israel, Norway, the UK and the USA (recruitment 1950s-2000s). Primary outcomes were any childhood leukaemia and acute lymphoblastic leukaemia (ALL); secondary outcomes were acute myeloid leukaemia (AML) and any childhood cancer. Exposures included maternal self-reported infections [influenza-like illness, common cold, any respiratory tract infection, vaginal thrush, vaginal infections and urinary tract infection (including cystitis)] and infection-associated symptoms (fever and diarrhoea) during pregnancy. Covariate-adjusted hazard ratio (HR) and 95% confidence interval (CI) were estimated using multilevel Cox models. RESULTS Among 312 879 children with a median follow-up of 13.6 years, 167 leukaemias, including 129 ALL and 33 AML, were identified. Maternal urinary tract infection was associated with increased risk of any leukaemia [HR (95% CI) 1.68 (1.10-2.58)] and subtypes ALL [1.49 (0.87-2.56)] and AML [2.70 ([0.93-7.86)], but not with any cancer [1.13 (0.85-1.51)]. Respiratory tract infection was associated with increased risk of any leukaemia [1.57 (1.06-2.34)], ALL [1.43 (0.94-2.19)], AML [2.37 (1.10-5.12)] and any cancer [1.33 (1.09-1.63)]; influenza-like illness showed a similar pattern but with less precise estimates. There was no evidence of a link between other infections and any outcomes. CONCLUSIONS Urinary tract and respiratory tract infections during pregnancy may be associated with childhood leukaemia, but the absolute risk is small given the rarity of the outcome.
Collapse
Affiliation(s)
- Jian-Rong He
- Nuffield Department of Women’s and Reproductive Health, University of
Oxford, Oxford, UK
- Division of Birth Cohort Study, Guangzhou Women and Children’s Medical
Center, Guangzhou Medical University, Guangzhou, China
- George Institute for Global Health, University of Oxford,
Oxford, UK
| | - Jane E Hirst
- Nuffield Department of Women’s and Reproductive Health, University of
Oxford, Oxford, UK
- George Institute for Global Health, University of Oxford,
Oxford, UK
| | - Gabriella Tikellis
- Murdoch Children’s Research Institute, Royal Children’s Hospital,
University of Melbourne, Melbourne, VIC, Australia
| | - Gary S Phillips
- Retired from Center for Biostatistics, Department of Biomedical
Informatics, Ohio State University, Columbus, OH, USA
| | - Rema Ramakrishnan
- Nuffield Department of Women’s and Reproductive Health, University of
Oxford, Oxford, UK
- George Institute for Global Health, University of Oxford,
Oxford, UK
- University of New South Wales, Faculty of Medicine, Sydney,
NSW, Australia
| | - Ora Paltiel
- Braun School of Public Health, Hadassah-Hebrew University Medical
Center, Jerusalem, Israel
| | - Anne-Louise Ponsonby
- Murdoch Children’s Research Institute, Royal Children’s Hospital,
University of Melbourne, Melbourne, VIC, Australia
| | - Mark Klebanoff
- Center for Perinatal Research, Abigail Wexner Research Institute at
Nationwide Children's Hospital, Columbus, OH, USA
| | - Jørn Olsen
- Department of Clinical Epidemiology, Aarhus University,
Aarhus, Denmark
| | - Michael F G Murphy
- Nuffield Department of Women’s and Reproductive Health, University of
Oxford, Oxford, UK
| | - Siri E Håberg
- Centre for Fertility and Health, Norwegian Institute of Public
Health, Oslo, Norway
| | - Stanley Lemeshow
- Division of Biostatistics, College of Public Health, Ohio State
University, Columbus, OH, USA
| | - Sjurdur F Olsen
- Centre for Fetal Programming, Department of Epidemiology Research, Statens
Serum Institut, Copenhagen, Denmark
| | - Xiu Qiu
- Division of Birth Cohort Study, Guangzhou Women and Children’s Medical
Center, Guangzhou Medical University, Guangzhou, China
| | - Per Magnus
- Centre for Fertility and Health, Norwegian Institute of Public
Health, Oslo, Norway
| | - Jean Golding
- Centre for Academic Child Health, Population Health Sciences, Bristol
Medical School, University of Bristol, Bristol, UK
| | - Mary H Ward
- Occupational and Environmental Epidemiology Branch, Division of Cancer
Epidemiology and Genetics, National Cancer Institute, Rockville, MD,
USA
| | - Joseph L Wiemels
- Department of Preventative Medicine, University of Southern
California, Los Angeles, CA, USA
and
| | - Kazem Rahimi
- Nuffield Department of Women’s and Reproductive Health, University of
Oxford, Oxford, UK
- George Institute for Global Health, University of Oxford,
Oxford, UK
| | - Martha S Linet
- Division of Cancer Epidemiology and Genetics, National Cancer
Institute, Bethesda, MD, USA
| | - Terence Dwyer
- Corresponding author. Nuffield Department of Women’s and
Reproductive Health, University of Oxford, Oxford OX3 9DU, UK. E-mail:
| | | |
Collapse
|
6
|
Middleton M, Nguyen C, Moreno-Betancur M, Carlin JB, Lee KJ. Evaluation of multiple imputation approaches for handling missing covariate information in a case-cohort study with a binary outcome. BMC Med Res Methodol 2022; 22:87. [PMID: 35369860 PMCID: PMC8978363 DOI: 10.1186/s12874-021-01495-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 12/15/2021] [Indexed: 11/21/2022] Open
Abstract
Background In case-cohort studies a random subcohort is selected from the inception cohort and acts as the sample of controls for several outcome investigations. Analysis is conducted using only the cases and the subcohort, with inverse probability weighting (IPW) used to account for the unequal sampling probabilities resulting from the study design. Like all epidemiological studies, case-cohort studies are susceptible to missing data. Multiple imputation (MI) has become increasingly popular for addressing missing data in epidemiological studies. It is currently unclear how best to incorporate the weights from a case-cohort analysis in MI procedures used to address missing covariate data. Method A simulation study was conducted with missingness in two covariates, motivated by a case study within the Barwon Infant Study. MI methods considered were: using the outcome, a proxy for weights in the simple case-cohort design considered, as a predictor in the imputation model, with and without exposure and covariate interactions; imputing separately within each weight category; and using a weighted imputation model. These methods were compared to a complete case analysis (CCA) within the context of a standard IPW analysis model estimating either the risk or odds ratio. The strength of associations, missing data mechanism, proportion of observations with incomplete covariate data, and subcohort selection probability varied across the simulation scenarios. Methods were also applied to the case study. Results There was similar performance in terms of relative bias and precision with all MI methods across the scenarios considered, with expected improvements compared with the CCA. Slight underestimation of the standard error was seen throughout but the nominal level of coverage (95%) was generally achieved. All MI methods showed a similar increase in precision as the subcohort selection probability increased, irrespective of the scenario. A similar pattern of results was seen in the case study. Conclusions How weights were incorporated into the imputation model had minimal effect on the performance of MI; this may be due to case-cohort studies only having two weight categories. In this context, inclusion of the outcome in the imputation model was sufficient to account for the unequal sampling probabilities in the analysis model. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01495-4.
Collapse
|
7
|
Vela J, Cordtz RL, Kristensen S, Torp-Pedersen C, Petersen KK, Arendt-Nielsen L, Dreyer L. Is pain associated with premature mortality in patients with psoriatic arthritis? A nested case-control study using the DANBIO Register. Rheumatology (Oxford) 2021; 60:5216-5223. [PMID: 33668054 DOI: 10.1093/rheumatology/keab192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 02/14/2021] [Indexed: 11/13/2022] Open
Abstract
OBJECTIVES It has been hypothesized that the presence of chronic pain causes excess mortality. Since chronic pain is prevalent among patients with PsA this potential association should be explored. We aimed to investigate whether higher cumulative pain intensity is associated with an excess mortality risk in patients with PsA. METHODS A nested case-control study using data from the nationwide DANBIO Register (Danish Database for Biological Therapies in Rheumatology) Register and Danish healthcare registers. Cases were patients who died and corresponding to the date of death, matched on sex, year of birth and calendar period at the time of death with up to five controls. Exposure of interest was mean pain intensity reported during the time followed in routine rheumatology practice. Pain intensity was measured using a visual analogue scale from 0 to 100 and conditional logistic regression was used to calculate odds of mortality per 5 unit increase in pain while adjusting for confounders. RESULTS The cohort consisted of 8019 patients. A total of 276 cases were identified and matched with 1187 controls. Higher mean pain intensity was associated with increased odds of mortality [odds ratio 1.06 (95% CI 1.02, 1.10)] in the crude model, but there was no association [odds ratio 0.99 (95% CI 0.95, 1.03)] when adjusting for additional confounders. Factors shown to increase the odds of mortality were recent glucocorticoid use, concomitant chronic obstructive pulmonary disease, diabetes mellitus, cancer and cardiovascular disease. CONCLUSION These results indicate that experienced pain in itself is not associated with premature mortality in patients with PsA. However, recent glucocorticoid use and concurrent comorbidities were.
Collapse
Affiliation(s)
- Jonathan Vela
- Department of Rheumatology.,Department of Clinical Medicine, Aalborg University, Aalborg
| | - Rene Lindholm Cordtz
- Department of Rheumatology.,Centre for Rheumatology and Spine diseases, Gentofte Hospital, Copenhagen
| | - Salome Kristensen
- Department of Rheumatology.,Department of Clinical Medicine, Aalborg University, Aalborg
| | | | - Kristian Kjær Petersen
- Centre for Sensory-Motor Interaction, Aalborg University, Aalborg.,Centre for Neuroplasticity and Pain (CNAP), SMI, Department of Health Science and Technology, School of Medicine, Aalborg University, Aalborg
| | | | - Lene Dreyer
- Department of Rheumatology.,Department of Clinical Medicine, Aalborg University, Aalborg.,DANBIO Register, Denmark
| |
Collapse
|
8
|
Shin YE, Gail MH, Pfeiffer RM. Assessing risk model calibration with missing covariates. Biostatistics 2021; 23:875-890. [PMID: 33616159 DOI: 10.1093/biostatistics/kxaa060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2020] [Revised: 12/07/2020] [Accepted: 12/11/2020] [Indexed: 11/12/2022] Open
Abstract
When validating a risk model in an independent cohort, some predictors may be missing for some subjects. Missingness can be unplanned or by design, as in case-cohort or nested case-control studies, in which some covariates are measured only in subsampled subjects. Weighting methods and imputation are used to handle missing data. We propose methods to increase the efficiency of weighting to assess calibration of a risk model (i.e. bias in model predictions), which is quantified by the ratio of the number of observed events, $\mathcal{O}$, to expected events, $\mathcal{E}$, computed from the model. We adjust known inverse probability weights by incorporating auxiliary information available for all cohort members. We use survey calibration that requires the weighted sum of the auxiliary statistics in the complete data subset to equal their sum in the full cohort. We show that a pseudo-risk estimate that approximates the actual risk value but uses only variables available for the entire cohort is an excellent auxiliary statistic to estimate $\mathcal{E}$. We derive analytic variance formulas for $\mathcal{O}/\mathcal{E}$ with adjusted weights. In simulations, weight adjustment with pseudo-risk was much more efficient than inverse probability weighting and yielded consistent estimates even when the pseudo-risk was a poor approximation. Multiple imputation was often efficient but yielded biased estimates when the imputation model was misspecified. Using these methods, we assessed calibration of an absolute risk model for second primary thyroid cancer in an independent cohort.
Collapse
Affiliation(s)
- Yei Eun Shin
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Mitchell H Gail
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| | - Ruth M Pfeiffer
- Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, 9609 Medical Center Drive, Rockville, MD 20850, USA
| |
Collapse
|
9
|
Kubale J, Balmaseda A, Sanchez N, Lopez R, Gresh L, Ojeda S, Harris E, Kuan G, Zelner J, Gordon A. Pneumonia following symptomatic influenza infection among Nicaraguan children before and after introduction of the pneumococcal conjugate vaccine. J Infect Dis 2020; 224:643-647. [PMID: 33351091 DOI: 10.1093/infdis/jiaa776] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/16/2020] [Indexed: 11/14/2022] Open
Abstract
Influenza is associated with primary viral and secondary bacterial pneumonias; however, the dynamics of this relationship in populations with varied levels of pneumococcal vaccination remain unclear. We conducted nested matched case-control studies in two prospective cohorts of Nicaraguan children aged 2-14 years: one before PCV introduction (2008-2010) and one following its introduction and near universal adoption (2011-2018). The association between influenza and pneumonia was similar in both cohorts. Participants with influenza (across types/subtypes) had higher odds of developing pneumonia in the month following influenza infection. These findings underscore the importance of considering influenza in interventions to reduce global pneumonia burden.
Collapse
Affiliation(s)
- John Kubale
- Department of Epidemiology, School of Public Health, University of Michigan, Michigan, USA
| | - Angel Balmaseda
- Laboratorio Nacional de Virología, Centro Nacional de Diagnósticoy Referencia, Ministry of Health, Managua, Nicaragua
| | - Nery Sanchez
- Sustainable Sciences Institute, Managua, Nicaragua
| | - Roger Lopez
- Laboratorio Nacional de Virología, Centro Nacional de Diagnósticoy Referencia, Ministry of Health, Managua, Nicaragua
| | - Lionel Gresh
- Sustainable Sciences Institute, Managua, Nicaragua
| | - Sergio Ojeda
- Sustainable Sciences Institute, Managua, Nicaragua
| | - Eva Harris
- Division of Infectious Diseases and Vaccinology, School of Public Health, University of California, Berkeley, Berkeley, California, USA
| | - Guillermina Kuan
- Health Center Sócrates Flores Vivas, Ministry of Health, Managua, Nicaragua
| | - Jon Zelner
- Department of Epidemiology, School of Public Health, University of Michigan, Michigan, USA
| | - Aubree Gordon
- Department of Epidemiology, School of Public Health, University of Michigan, Michigan, USA
| |
Collapse
|
10
|
Pankhurst L, Mitra R, Kimber A, Collett D. Multiply imputing missing values arising by design in transplant survival data. Biom J 2020; 62:1192-1207. [PMID: 32077133 DOI: 10.1002/bimj.201800253] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2018] [Revised: 03/26/2019] [Accepted: 04/25/2019] [Indexed: 11/12/2022]
Abstract
In this article, we address a missing data problem that occurs in transplant survival studies. Recipients of organ transplants are followed up from transplantation and their survival times recorded, together with various explanatory variables. Due to differences in data collection procedures in different centers or over time, a particular explanatory variable (or set of variables) may only be recorded for certain recipients, which results in this variable being missing for a substantial number of records in the data. The variable may also turn out to be an important predictor of survival and so it is important to handle this missing-by-design problem appropriately. Consensus in the literature is to handle this problem with complete case analysis, as the missing data are assumed to arise under an appropriate missing at random mechanism that gives consistent estimates here. Specifically, the missing values can reasonably be assumed not to be related to the survival time. In this article, we investigate the potential for multiple imputation to handle this problem in a relevant study on survival after kidney transplantation, and show that it comprehensively outperforms complete case analysis on a range of measures. This is a particularly important finding in the medical context as imputing large amounts of missing data is often viewed with scepticism.
Collapse
Affiliation(s)
- Laura Pankhurst
- Statistics and Clinical Studies, NHS Blood and Transplant, Bristol, UK
| | - Robin Mitra
- Department of Mathematics and Statistics, Lancaster University, Lancaster, UK
| | - Alan Kimber
- Mathematical Sciences, University of Southampton, Southampton, UK
| | - Dave Collett
- Statistics and Clinical Studies, NHS Blood and Transplant, Bristol, UK
| |
Collapse
|
11
|
Shin YE, Pfeiffer RM, Graubard BI, Gail MH. Weight calibration to improve the efficiency of pure risk estimates from case‐control samples nested in a cohort. Biometrics 2020; 76:1087-1097. [DOI: 10.1111/biom.13209] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 10/17/2019] [Accepted: 12/16/2019] [Indexed: 11/30/2022]
Affiliation(s)
- Yei Eun Shin
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| | - Ruth M. Pfeiffer
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| | - Barry I. Graubard
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| | - Mitchell H. Gail
- Biostatistics Branch Division of Cancer Epidemiology and Genetics National Cancer Institute Rockville Maryland
| |
Collapse
|
12
|
Cheng CY, Tseng WL, Chang CF, Chang CH, Gau SSF. A Deep Learning Approach for Missing Data Imputation of Rating Scales Assessing Attention-Deficit Hyperactivity Disorder. Front Psychiatry 2020; 11:673. [PMID: 32765316 PMCID: PMC7379397 DOI: 10.3389/fpsyt.2020.00673] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Accepted: 06/29/2020] [Indexed: 02/03/2023] Open
Abstract
A variety of tools and methods have been used to measure behavioral symptoms of attention-deficit/hyperactivity disorder (ADHD). Missing data is a major concern in ADHD behavioral studies. This study used a deep learning method to impute missing data in ADHD rating scales and evaluated the ability of the imputed dataset (i.e., the imputed data replacing the original missing values) to distinguish youths with ADHD from youths without ADHD. The data were collected from 1220 youths, 799 of whom had an ADHD diagnosis, and 421 were typically developing (TD) youths without ADHD, recruited in Northern Taiwan. Participants were assessed using the Conners' Continuous Performance Test, the Chinese versions of the Conners' rating scale-revised: short form for parent and teacher reports, and the Swanson, Nolan, and Pelham, version IV scale for parent and teacher reports. We used deep learning, with information from the original complete dataset (referred to as the reference dataset), to perform missing data imputation and generate an imputation order according to the imputed accuracy of each question. We evaluated the effectiveness of imputation using support vector machine to classify the ADHD and TD groups in the imputed dataset. The imputed dataset can classify ADHD vs. TD up to 89% accuracy, which did not differ from the classification accuracy (89%) using the reference dataset. Most of the behaviors related to oppositional behaviors rated by teachers and hyperactivity/impulsivity rated by both parents and teachers showed high discriminatory accuracy to distinguish ADHD from non-ADHD. Our findings support a deep learning solution for missing data imputation without introducing bias to the data.
Collapse
Affiliation(s)
- Chung-Yuan Cheng
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan.,Department of Psychiatry, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan
| | - Wan-Ling Tseng
- Child Study Center, Yale University School of Medicine, New Haven, CT, United States
| | - Ching-Fen Chang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Chuan-Hsiung Chang
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan
| | - Susan Shur-Fen Gau
- Department of Psychiatry, National Taiwan University Hospital and College of Medicine, Taipei, Taiwan.,Graduate Institute of Brain and Mind Sciences, and Graduate Institute of Clinical Medicine, College of Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|