1
|
Ramgopal S, Belanger T, Lorenz D, Lipsett SC, Neuman MI, Liebovitz D, Florin TA. Preferences for Management of Pediatric Pneumonia: A Clinician Survey of Artificially Generated Patient Cases. Pediatr Emerg Care 2024:00006565-990000000-00488. [PMID: 38950412 DOI: 10.1097/pec.0000000000003231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]
Abstract
BACKGROUND It is unknown which factors are associated with chest radiograph (CXR) and antibiotic use for suspected community-acquired pneumonia (CAP) in children. We evaluated factors associated with CXR and antibiotic preferences among clinicians for children with suspected CAP using case scenarios generated through artificial intelligence (AI). METHODS We performed a survey of general pediatric, pediatric emergency medicine, and emergency medicine attending physicians employed by a private physician contractor. Respondents were given 5 unique, AI-generated case scenarios. We used generalized estimating equations to identify factors associated with CXR and antibiotic use. We evaluated the cluster-weighted correlation between clinician suspicion and clinical prediction model risk estimates for CAP using 2 predictive models. RESULTS A total of 172 respondents provided responses to 839 scenarios. Factors associated with CXR acquisition (OR, [95% CI]) included presence of crackles (4.17 [2.19, 7.95]), prior pneumonia (2.38 [1.32, 4.20]), chest pain (1.90 [1.18, 3.05]) and fever (1.82 [1.32, 2.52]). The decision to use antibiotics before knowledge of CXR results included past hospitalization for pneumonia (4.24 [1.88, 9.57]), focal decreased breath sounds (3.86 [1.98, 7.52]), and crackles (3.45 [2.15, 5.53]). After revealing CXR results to clinicians, these results were the sole predictor associated with antibiotic decision-making. Suspicion for CAP correlated with one of 2 prediction models for CAP (Spearman's rho = 0.25). Factors associated with a greater suspicion of pneumonia included prior pneumonia, duration of illness, worsening course of illness, shortness of breath, vomiting, decreased oral intake or urinary output, respiratory distress, head nodding, focal decreased breath sounds, focal rhonchi, fever, and crackles, and lower pulse oximetry. CONCLUSIONS Ordering preferences for CXRs demonstrated similarities and differences with evidence-based risk models for CAP. Clinicians relied heavily on CXR findings to guide antibiotic ordering. These findings can be used within decision support systems to promote evidence-based management practices for pediatric CAP.
Collapse
Affiliation(s)
- Sriram Ramgopal
- From the Division of Emergency Medicine, Ann & Robert H. Lurie Children's Hospital of Chicago, Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL
| | | | - Douglas Lorenz
- Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, KY
| | - Susan C Lipsett
- Department of Pediatrics, Division of Emergency Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA
| | - Mark I Neuman
- Department of Pediatrics, Division of Emergency Medicine, Boston Children's Hospital, Harvard Medical School, Boston, MA
| | - David Liebovitz
- Department of General Internal Medicine, Northwestern University Feinberg School of Medicine, Northwestern University, Chicago, IL
| | - Todd A Florin
- From the Division of Emergency Medicine, Ann & Robert H. Lurie Children's Hospital of Chicago, Department of Pediatrics, Northwestern University Feinberg School of Medicine, Chicago, IL
| |
Collapse
|
2
|
Brzezinski RY, Wasserman A, Sasson N, Stark M, Goldiner I, Rogowski O, Berliner S, Argov O. An Exploratory Analysis of Routine Ferritin Measurement Upon Admission and the Prognostic Implications of Low-Grade Ferritinemia During Inflammation. Am J Med 2024:S0002-9343(24)00277-8. [PMID: 38723929 DOI: 10.1016/j.amjmed.2024.04.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 06/10/2024]
Abstract
BACKGROUND Serum ferritin is usually measured in the presence of anemia or in suspected iron overload syndromes. Ferritin is also an acute-phase protein that is elevated during systemic inflammation. However, the prognostic value of routinely measuring ferritin upon admission to a medical facility is not clear. Therefore, we examined the association between ferritin concentrations measured at the time of hospital admission with 30-day and long-term mortality. METHODS We obtained routine ferritin measurements taken within 24 hours of admission in 2859 patients hospitalized in an internal medicine department. Multiple clinical and laboratory parameters were used to assess the association between ferritin and overall mortality during a median follow-up of 15 months (interquartile range [IQR] 8-22). RESULTS Ferritin levels were associated with increased 30-day mortality rates (odds ratio [OR] 1.04, 95% confidence interval [CI] 1.03-1.06) for each 100 ng/mL increase. Patients with intermediate (78-220 ng/mL) and high (>221 ng/mL) ferritin concentrations (2nd and 3rd tertiles) had higher 30-day mortality rates even after adjustment for age, sex, and existing comorbidities (OR 2.05, 95% CI 1.70-2.5). Long-term overall mortality rates demonstrated a similar pattern across ferritin tertiles (hazard ratio [HR] 1.54, 95% CI 1.39-1.71). CONCLUSIONS Routine admission ferritin concentrations are linearly and independently correlated with excess mortality risk in hospitalized patients, even those with apparently "normal" ferritin concentrations (<300 mg/mL). Thus, low-grade ferritinemia might not be an innocent finding in the context of the inflammatory response. Its potential biological and therapeutic implications warrant future research.
Collapse
Affiliation(s)
- Rafael Y Brzezinski
- Internal Medicine "C" and "E", Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
| | - Asaf Wasserman
- Internal Medicine "C" and "E", Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Noa Sasson
- Internal Medicine "C" and "E", Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Moshe Stark
- Division of Clinical Laboratories, Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ilana Goldiner
- Division of Clinical Laboratories, Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ori Rogowski
- Internal Medicine "C" and "E", Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Shlomo Berliner
- Internal Medicine "C" and "E", Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Ori Argov
- Internal Medicine "C" and "E", Tel Aviv Medical Center, Tel Aviv, Israel, affiliated with the Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| |
Collapse
|
3
|
El Emam K, Mosquera L, Fang X, El-Hussuna A. An evaluation of the replicability of analyses using synthetic health data. Sci Rep 2024; 14:6978. [PMID: 38521806 PMCID: PMC10960851 DOI: 10.1038/s41598-024-57207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/15/2024] [Indexed: 03/25/2024] Open
Abstract
Synthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data. Replicability has been defined using two criteria: (a) replicate the results of the analyses on real data, and (b) ensure valid population inferences from the synthetic data. A simulation study using three heterogeneous real-world datasets evaluated the replicability of logistic regression workloads. Eight replicability metrics were evaluated: decision agreement, estimate agreement, standardized difference, confidence interval overlap, bias, confidence interval coverage, statistical power, and precision (empirical SE). The analysis of synthetic data used a multiple imputation approach whereby up to 20 datasets were generated and the fitted logistic regression models were combined using combining rules for fully synthetic datasets. The effects of synthetic data amplification were evaluated, and two types of generative models were used: sequential synthesis using boosted decision trees and a generative adversarial network (GAN). Privacy risk was evaluated using a membership disclosure metric. For sequential synthesis, adjusted model parameters after combining at least ten synthetic datasets gave high decision and estimate agreement, low standardized difference, as well as high confidence interval overlap, low bias, the confidence interval had nominal coverage, and power close to the nominal level. Amplification had only a marginal benefit. Confidence interval coverage from a single synthetic dataset without applying combining rules were erroneous, and statistical power, as expected, was artificially inflated when amplification was used. Sequential synthesis performed considerably better than the GAN across multiple datasets. Membership disclosure risk was low for all datasets and models. For replicable results, the statistical analysis of fully synthetic data should be based on at least ten generated datasets of the same size as the original whose analyses results are combined. Analysis results from synthetic data without applying combining rules can be misleading. Replicability results are dependent on the type of generative model used, with our study suggesting that sequential synthesis has good replicability characteristics for common health research workloads.
Collapse
Affiliation(s)
- Khaled El Emam
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.
- Replica Analytics, Ottawa, ON, Canada.
- Children's Hospital of Eastern Ontario (CHEO) Research Institute, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada.
| | - Lucy Mosquera
- Replica Analytics, Ottawa, ON, Canada
- Children's Hospital of Eastern Ontario (CHEO) Research Institute, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada
| | - Xi Fang
- Replica Analytics, Ottawa, ON, Canada
| | | |
Collapse
|
4
|
Lun R, Siegal D, Ramsay T, Stotts G, Dowlatshahi D. Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data. PLoS One 2024; 19:e0295921. [PMID: 38324588 PMCID: PMC10849264 DOI: 10.1371/journal.pone.0295921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 12/01/2023] [Indexed: 02/09/2024] Open
Abstract
OBJECTIVES Synthetic datasets are artificially manufactured based on real health systems data but do not contain real patient information. We sought to validate the use of synthetic data in stroke and cancer research by conducting a comparison study of cancer patients with ischemic stroke to non-cancer patients with ischemic stroke. DESIGN retrospective cohort study. SETTING We used synthetic data generated by MDClone and compared it to its original source data (i.e. real patient data from the Ottawa Hospital Data Warehouse). OUTCOME MEASURES We compared key differences in demographics, treatment characteristics, length of stay, and costs between cancer patients with ischemic stroke and non-cancer patients with ischemic stroke. We used a binary, multivariable logistic regression model to identify risk factors for recurrent stroke in the cancer population. RESULTS Using synthetic data, we found cancer patients with ischemic stroke had a lower prevalence of hypertension (52.0% in the cancer cohort vs 57.7% in the non-cancer cohort, p<0.0001), and a higher prevalence of chronic obstructive pulmonary disease (COPD: 8.5% vs 4.7%, p<0.0001), prior ischemic stroke (1.7% vs 0.1%, p<0.001), and prior venous thromboembolism (VTE: 8.2% vs 1.5%, p<0.0001). They also had a longer length of stay (8 days [IQR 3-16] vs 6 days [IQR 3-13], p = 0.011), and higher costs associated with their stroke encounters: $11,498 (IQR $4,440 -$20,668) in the cancer cohort vs $8,084 (IQR $3,947 -$16,706) in the non-cancer cohort (p = 0.0061). A multivariable logistic regression model identified 5 predictors for recurrent ischemic stroke in the cancer cohort using synthetic data; 3 of the same predictors identified using real patient data with similar effect measures. Summary statistics between synthetic and original datasets did not significantly differ, other than slight differences in the distributions of frequencies for numeric data. CONCLUSION We demonstrated the utility of synthetic data in stroke and cancer research and provided key differences between cancer and non-cancer patients with ischemic stroke. Synthetic data is a powerful tool that can allow researchers to easily explore hypothesis generation, enable data sharing without privacy breaches, and ensure broad access to big data in a rapid, safe, and reliable fashion.
Collapse
Affiliation(s)
- Ronda Lun
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada
- Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Deborah Siegal
- School of Epidemiology, University of Ottawa, Ottawa, Canada
- Division of Hematology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Tim Ramsay
- School of Epidemiology, University of Ottawa, Ottawa, Canada
| | - Grant Stotts
- Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| | - Dar Dowlatshahi
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Canada
- Clinical Epidemiology Program, Ottawa Hospital Research Institute, Ottawa, Canada
- Division of Neurology, Department of Medicine, The Ottawa Hospital, Ottawa, Canada
| |
Collapse
|
5
|
Mok H, Ostendorf E, Ganninger A, Adler AJ, Hazan G, Haspel JA. Circadian immunity from bench to bedside: a practical guide. J Clin Invest 2024; 134:e175706. [PMID: 38299593 PMCID: PMC10836804 DOI: 10.1172/jci175706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024] Open
Abstract
The immune system is built to counteract unpredictable threats, yet it relies on predictable cycles of activity to function properly. Daily rhythms in immune function are an expanding area of study, and many originate from a genetically based timekeeping mechanism known as the circadian clock. The challenge is how to harness these biological rhythms to improve medical interventions. Here, we review recent literature documenting how circadian clocks organize fundamental innate and adaptive immune activities, the immunologic consequences of circadian rhythm and sleep disruption, and persisting knowledge gaps in the field. We then consider the evidence linking circadian rhythms to vaccination, an important clinical realization of immune function. Finally, we discuss practical steps to translate circadian immunity to the patient's bedside.
Collapse
Affiliation(s)
- Huram Mok
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Elaine Ostendorf
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Alex Ganninger
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Avi J. Adler
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
| | - Guy Hazan
- Department of Pediatrics, Soroka University Medical Center, Beer-Sheva, Israel
- Research and Innovation Center, Saban Children’s Hospital, Beer-Sheva, Israel
| | - Jeffrey A. Haspel
- Division of Pulmonary and Critical Care Medicine, Department of Internal Medicine, Washington University School of Medicine, St. Louis, Missouri, USA
| |
Collapse
|
6
|
Prasanna A, Jing B, Plopper G, Miller KK, Sanjak J, Feng A, Prezek S, Vidyaprakash E, Thovarai V, Maier EJ, Bhattacharya A, Naaman L, Stephens H, Watford S, Boscardin WJ, Johanson E, Lienau A. Synthetic Health Data Can Augment Community Research Efforts to Better Inform the Public During Emerging Pandemics. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.11.23298687. [PMID: 38168217 PMCID: PMC10760275 DOI: 10.1101/2023.12.11.23298687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
The COVID-19 pandemic had disproportionate effects on the Veteran population due to the increased prevalence of medical and environmental risk factors. Synthetic electronic health record (EHR) data can help meet the acute need for Veteran population-specific predictive modeling efforts by avoiding the strict barriers to access, currently present within Veteran Health Administration (VHA) datasets. The U.S. Food and Drug Administration (FDA) and the VHA launched the precisionFDA COVID-19 Risk Factor Modeling Challenge to develop COVID-19 diagnostic and prognostic models; identify Veteran population-specific risk factors; and test the usefulness of synthetic data as a substitute for real data. The use of synthetic data boosted challenge participation by providing a dataset that was accessible to all competitors. Models trained on synthetic data showed similar but systematically inflated model performance metrics to those trained on real data. The important risk factors identified in the synthetic data largely overlapped with those identified from the real data, and both sets of risk factors were validated in the literature. Tradeoffs exist between synthetic data generation approaches based on whether a real EHR dataset is required as input. Synthetic data generated directly from real EHR input will more closely align with the characteristics of the relevant cohort. This work shows that synthetic EHR data will have practical value to the Veterans' health research community for the foreseeable future.
Collapse
Affiliation(s)
| | - Bocheng Jing
- Northern California Institute for Research and Education
- San Francisco VA Medical Center
| | | | | | | | | | | | | | | | | | | | | | | | - Sean Watford
- Booz Allen Hamilton
- Currently U.S. Environmental Protection Agency
| | - W John Boscardin
- University of California, San Francisco, Department of Medicine
- University of California, San Francisco, Department of Epidemiology & Biostatistics
| | | | | |
Collapse
|
7
|
Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med 2023; 6:186. [PMID: 37813960 PMCID: PMC10562365 DOI: 10.1038/s41746-023-00927-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 09/14/2023] [Indexed: 10/11/2023] Open
Abstract
Data-driven decision-making in modern healthcare underpins innovation and predictive analytics in public health and clinical research. Synthetic data has shown promise in finance and economics to improve risk assessment, portfolio optimization, and algorithmic trading. However, higher stakes, potential liabilities, and healthcare practitioner distrust make clinical use of synthetic data difficult. This paper explores the potential benefits and limitations of synthetic data in the healthcare analytics context. We begin with real-world healthcare applications of synthetic data that informs government policy, enhance data privacy, and augment datasets for predictive analytics. We then preview future applications of synthetic data in the emergent field of digital twin technology. We explore the issues of data quality and data bias in synthetic data, which can limit applicability across different applications in the clinical context, and privacy concerns stemming from data misuse and risk of re-identification. Finally, we evaluate the role of regulatory agencies in promoting transparency and accountability and propose strategies for risk mitigation such as Differential Privacy (DP) and a dataset chain of custody to maintain data integrity, traceability, and accountability. Synthetic data can improve healthcare, but measures to protect patient well-being and maintain ethical standards are key to promote responsible use.
Collapse
Affiliation(s)
- Mauro Giuffrè
- Department of Medicine (Digestive Diseases), Yale School of Medicine, Yale University, New Haven, CT, USA.
- Department of Medical, Surgical and Health Science, University of Trieste, Trieste, Italy.
| | - Dennis L Shung
- Department of Medicine (Digestive Diseases), Yale School of Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
8
|
Ang CYS, Chiew YS, Wang X, Ooi EH, Nor MBM, Cove ME, Chase JG. Virtual patient with temporal evolution for mechanical ventilation trial studies: A stochastic model approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 240:107728. [PMID: 37531693 DOI: 10.1016/j.cmpb.2023.107728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/27/2023] [Accepted: 07/19/2023] [Indexed: 08/04/2023]
Abstract
BACKGROUND AND OBJECTIVE Healthcare datasets are plagued by issues of data scarcity and class imbalance. Clinically validated virtual patient (VP) models can provide accurate in-silico representations of real patients and thus a means for synthetic data generation in hospital critical care settings. This research presents a realistic, time-varying mechanically ventilated respiratory failure VP profile synthesised using a stochastic model. METHODS A stochastic model was developed using respiratory elastance (Ers) data from two clinical cohorts and averaged over 30-minute time intervals. The stochastic model was used to generate future Ers data based on current Ers values with added normally distributed random noise. Self-validation of the VPs was performed via Monte Carlo simulation and retrospective Ers profile fitting. A stochastic VP cohort of temporal Ers evolution was synthesised and then compared to an independent retrospective patient cohort data in a virtual trial across several measured patient responses, where similarity of profiles validates the realism of stochastic model generated VP profiles. RESULTS A total of 120,000 3-hour VPs for pressure control (PC) and volume control (VC) ventilation modes are generated using stochastic simulation. Optimisation of the stochastic simulation process yields an ideal noise percentage of 5-10% and simulation iteration of 200,000 iterations, allowing the simulation of a realistic and diverse set of Ers profiles. Results of self-validation show the retrospective Ers profiles were able to be recreated accurately with a mean squared error of only 0.099 [0.009-0.790]% for the PC cohort and 0.051 [0.030-0.126]% for the VC cohort. A virtual trial demonstrates the ability of the stochastic VP cohort to capture Ers trends within and beyond the retrospective patient cohort providing cohort-level validation. CONCLUSION VPs capable of temporal evolution demonstrate feasibility for use in designing, developing, and optimising bedside MV guidance protocols through in-silico simulation and validation. Overall, the temporal VPs developed using stochastic simulation alleviate the need for lengthy, resource intensive, high cost clinical trials, while facilitating statistically robust virtual trials, ultimately leading to improved patient care and outcomes in mechanical ventilation.
Collapse
Affiliation(s)
| | | | - Xin Wang
- School of Engineering, Monash University Malaysia, Selangor, Malaysia
| | - Ean Hin Ooi
- School of Engineering, Monash University Malaysia, Selangor, Malaysia
| | - Mohd Basri Mat Nor
- Kulliyah of Medicine, International Islamic University Malaysia, Pahang, Malaysia
| | - Matthew E Cove
- Division of Respiratory & Critical Care Medicine, Department of Medicine, National University Hospital, Singapore
| | - J Geoffrey Chase
- Center of Bioengineering, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
9
|
Greenberg JK, Landman JM, Kelly MP, Pennicooke BH, Molina CA, Foraker RE, Ray WZ. Leveraging Artificial Intelligence and Synthetic Data Derivatives for Spine Surgery Research. Global Spine J 2023; 13:2409-2421. [PMID: 35373623 PMCID: PMC10538345 DOI: 10.1177/21925682221085535] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
STUDY DESIGN Retrospective cohort study. OBJECTIVES Leveraging electronic health records (EHRs) for spine surgery research is impeded by concerns regarding patient privacy and data ownership. Synthetic data derivatives may help overcome these limitations. This study's objective was to validate the use of synthetic data for spine surgery research. METHODS Data came from the EHR from 15 hospitals. Patients that underwent anterior cervical or posterior lumbar fusion (2010-2020) were included. Real data were obtained from the EHR. Synthetic data was generated to simulate the properties of the real data, without maintaining a one-to-one correspondence with real patients. Within each cohort, ability to predict 30-day readmissions and 30-day complications was evaluated using logistic regression and extreme gradient boosting machines (XGBoost). RESULTS We identified 9,072 real and 9,088 synthetic cervical fusion patients. Descriptive characteristics were nearly identical between the 2 datasets. When predicting readmission, models built using real and synthetic data both had c-statistics of .69-.71 using logistic regression and XGBoost. Among 12,111 real and 12,126 synthetic lumbar fusion patients, descriptive characteristics were nearly the same for most variables. Using logistic regression and XGBoost to predict readmission, discrimination was similar with models built using real and synthetic data (c-statistics .66-.69). When predicting complications, models derived using real and synthetic data showed similar discrimination in both cohorts. Despite some differences, the most influential predictors were similar in the real and synthetic datasets. CONCLUSION Synthetic data replicate most descriptive and predictive properties of real data, and therefore may expand EHR research in spine surgery.
Collapse
Affiliation(s)
- Jacob K. Greenberg
- Departments of Neurological Surgery, Medicine and Orthopaedic Surgery, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Joshua M. Landman
- Departments of Neurological Surgery, Medicine and Orthopaedic Surgery, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | | | - Brenton H. Pennicooke
- Departments of Neurological Surgery, Medicine and Orthopaedic Surgery, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | - Camilo A. Molina
- Departments of Neurological Surgery, Medicine and Orthopaedic Surgery, Washington University School of Medicine in St Louis, St Louis, MO, USA
| | | | - Wilson Z. Ray
- Departments of Neurological Surgery, Medicine and Orthopaedic Surgery, Washington University School of Medicine in St Louis, St Louis, MO, USA
| |
Collapse
|
10
|
Zuber S, Bechtiger L, Bodelet JS, Golin M, Heumann J, Kim JH, Klee M, Mur J, Noll J, Voll S, O’Keefe P, Steinhoff A, Zölitz U, Muniz-Terrera G, Shanahan L, Shanahan MJ, Hofer SM. An integrative approach for the analysis of risk and health across the life course: challenges, innovations, and opportunities for life course research. DISCOVER SOCIAL SCIENCE AND HEALTH 2023; 3:14. [PMID: 37469576 PMCID: PMC10352429 DOI: 10.1007/s44155-023-00044-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/26/2023] [Indexed: 07/21/2023]
Abstract
Life course epidemiology seeks to understand the intricate relationships between risk factors and health outcomes across different stages of life to inform prevention and intervention strategies to optimize health throughout the lifespan. However, extant evidence has predominantly been based on separate analyses of data from individual birth cohorts or panel studies, which may not be sufficient to unravel the complex interplay of risk and health across different contexts. We highlight the importance of a multi-study perspective that enables researchers to: (a) Compare and contrast findings from different contexts and populations, which can help identify generalizable patterns and context-specific factors; (b) Examine the robustness of associations and the potential for effect modification by factors such as age, sex, and socioeconomic status; and (c) Improve statistical power and precision by pooling data from multiple studies, thereby allowing for the investigation of rare exposures and outcomes. This integrative framework combines the advantages of multi-study data with a life course perspective to guide research in understanding life course risk and resilience on adult health outcomes by: (a) Encouraging the use of harmonized measures across studies to facilitate comparisons and synthesis of findings; (b) Promoting the adoption of advanced analytical techniques that can accommodate the complexities of multi-study, longitudinal data; and (c) Fostering collaboration between researchers, data repositories, and funding agencies to support the integration of longitudinal data from diverse sources. An integrative approach can help inform the development of individualized risk scores and personalized interventions to promote health and well-being at various life stages.
Collapse
Affiliation(s)
- Sascha Zuber
- Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada
- Center for the Interdisciplinary Study of Gerontology and Vulnerability, University of Geneva, Geneva, Switzerland
| | - Laura Bechtiger
- Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
| | | | - Marta Golin
- Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
| | - Jens Heumann
- Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
| | - Jung Hyun Kim
- University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Matthias Klee
- University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Jure Mur
- University of Edinburgh, Edinburgh, Scotland
| | - Jennie Noll
- Pennsylvania State University, State College, PA USA
| | - Stacey Voll
- Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada
| | - Patrick O’Keefe
- Department of Neurology, Oregon Health & Science University, Portland, OR USA
| | - Annekatrin Steinhoff
- Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
- University Hospital of Child and Adolescent Psychiatry and Psychotherapy, University of Bern, Bern, Switzerland
| | - Ulf Zölitz
- Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
| | | | - Lilly Shanahan
- Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
- Department of Psychology, University of Zürich, Zürich, Switzerland
| | - Michael J. Shanahan
- Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
- Department of Sociology, University of Zürich, Zürich, Switzerland
| | - Scott M. Hofer
- Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada
- Department of Neurology, Oregon Health & Science University, Portland, OR USA
| |
Collapse
|
11
|
Azizi Z, Lindner S, Shiba Y, Raparelli V, Norris CM, Kublickiene K, Herrero MT, Kautzky-Willer A, Klimek P, Gisinger T, Pilote L, El Emam K. A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health. Sci Rep 2023; 13:11540. [PMID: 37460705 DOI: 10.1038/s41598-023-38457-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 07/08/2023] [Indexed: 07/20/2023] Open
Abstract
Sharing health data for research purposes across international jurisdictions has been a challenge due to privacy concerns. Two privacy enhancing technologies that can enable such sharing are synthetic data generation (SDG) and federated analysis, but their relative strengths and weaknesses have not been evaluated thus far. In this study we compared SDG with federated analysis to enable such international comparative studies. The objective of the analysis was to assess country-level differences in the role of sex on cardiovascular health (CVH) using a pooled dataset of Canadian and Austrian individuals. The Canadian data was synthesized and sent to the Austrian team for analysis. The utility of the pooled (synthetic Canadian + real Austrian) dataset was evaluated by comparing the regression results from the two approaches. The privacy of the Canadian synthetic data was assessed using a membership disclosure test which showed an F1 score of 0.001, indicating low privacy risk. The outcome variable of interest was CVH, calculated through a modified CANHEART index. The main and interaction effect parameter estimates of the federated and pooled analyses were consistent and directionally the same. It took approximately one month to set up the synthetic data generation platform and generate the synthetic data, whereas it took over 1.5 years to set up the federated analysis system. Synthetic data generation can be an efficient and effective tool for enabling multi-jurisdictional studies while addressing privacy concerns.
Collapse
Affiliation(s)
- Zahra Azizi
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada
| | - Simon Lindner
- Department of Internal Medicine III, Division of Endocrinology and Metabolism, Gender Medicine Unit, Medical University of Vienna, Vienna, Austria
| | - Yumika Shiba
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada
- Faculty of Medicine, McGill University, Montreal, Canada
| | - Valeria Raparelli
- Department of Translational Medicine, University of Ferrara, Ferrara, Italy
- Faculty of Nursing, University of Alberta, Edmonton, AB, Canada
| | - Colleen M Norris
- Faculty of Nursing, University of Alberta, Edmonton, AB, Canada
- Heart and Stroke Strategic Clinical Networks, Alberta Health Services, Alberta, Canada
| | | | - Maria Trinidad Herrero
- Clinical & Experimental Neuroscience (NiCE-IMIB-IUIE), School of Medicine, University of Murcia, Murcia, Spain
| | - Alexandra Kautzky-Willer
- Department of Internal Medicine III, Division of Endocrinology and Metabolism, Gender Medicine Unit, Medical University of Vienna, Vienna, Austria
| | - Peter Klimek
- Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Vienna, Austria
- Complexity Science Hub Vienna, Vienna, Austria
| | - Teresa Gisinger
- Division of Endocrinology and Metabolism, Medical University of Vienna, Vienna, Austria
| | - Louise Pilote
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada.
- Divisions of Clinical Epidemiology and General Internal Medicine, McGill University Health Centre Research Institute, Montreal, QC, Canada.
| | - Khaled El Emam
- Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada.
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.
- Replica Analytics Ltd, Ottawa, ON, Canada.
| |
Collapse
|
12
|
Deniz-Garcia A, Fabelo H, Rodriguez-Almeida AJ, Zamora-Zamorano G, Castro-Fernandez M, Alberiche Ruano MDP, Solvoll T, Granja C, Schopf TR, Callico GM, Soguero-Ruiz C, Wägner AM. Quality, Usability, and Effectiveness of mHealth Apps and the Role of Artificial Intelligence: Current Scenario and Challenges. J Med Internet Res 2023; 25:e44030. [PMID: 37140973 PMCID: PMC10196903 DOI: 10.2196/44030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 02/19/2023] [Accepted: 03/10/2023] [Indexed: 03/12/2023] Open
Abstract
The use of artificial intelligence (AI) and big data in medicine has increased in recent years. Indeed, the use of AI in mobile health (mHealth) apps could considerably assist both individuals and health care professionals in the prevention and management of chronic diseases, in a person-centered manner. Nonetheless, there are several challenges that must be overcome to provide high-quality, usable, and effective mHealth apps. Here, we review the rationale and guidelines for the implementation of mHealth apps and the challenges regarding quality, usability, and user engagement and behavior change, with a special focus on the prevention and management of noncommunicable diseases. We suggest that a cocreation-based framework is the best method to address these challenges. Finally, we describe the current and future roles of AI in improving personalized medicine and provide recommendations for developing AI-based mHealth apps. We conclude that the implementation of AI and mHealth apps for routine clinical practice and remote health care will not be feasible until we overcome the main challenges regarding data privacy and security, quality assessment, and the reproducibility and uncertainty of AI results. Moreover, there is a lack of both standardized methods to measure the clinical outcomes of mHealth apps and techniques to encourage user engagement and behavior changes in the long term. We expect that in the near future, these obstacles will be overcome and that the ongoing European project, Watching the risk factors (WARIFA), will provide considerable advances in the implementation of AI-based mHealth apps for disease prevention and health promotion.
Collapse
Affiliation(s)
- Alejandro Deniz-Garcia
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
| | - Himar Fabelo
- Complejo Hospitalario Universitario Insular - Materno Infantil, Fundación Canaria Instituto de Investigación Sanitaria de Canarias, Las Palmas de Gran Canaria, Spain
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Antonio J Rodriguez-Almeida
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Garlene Zamora-Zamorano
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
- Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Maria Castro-Fernandez
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Maria Del Pino Alberiche Ruano
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
- Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Terje Solvoll
- Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway
- Faculty of Nursing and Health Sciences, Nord University, Bodø, Norway
| | - Conceição Granja
- Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway
- Faculty of Nursing and Health Sciences, Nord University, Bodø, Norway
| | - Thomas Roger Schopf
- Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway
| | - Gustavo M Callico
- Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| | - Cristina Soguero-Ruiz
- Departamento de Teoría de la Señal y Comunicaciones y Sistemas Telemáticos y Computación, Universidad Rey Juan Carlos, Madrid, Spain
| | - Ana M Wägner
- Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
- Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
| |
Collapse
|
13
|
Ganguli R, Lad R, Lin A, Yu X. Novel Generative Recurrent Neural Network Framework to Produce Accurate, Applicable, and Deidentified Synthetic Medical Data for Patients With Metastatic Cancer. JCO Clin Cancer Inform 2023; 7:e2200125. [PMID: 37130342 DOI: 10.1200/cci.22.00125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2023] Open
Abstract
PURPOSE Sensitive patient data cannot be easily shared/analyzed, severely limiting the innovative progress of research, specifically for marginalized/under-represented populations. Existing methods of deidentification are subject to data breaches. The objective of this study was to develop a neural network capable of generating a synthetic version of data for patients with novel postoperative metastatic cancer. METHODS We analyzed a metastatic cancer patient cohort of 167,474 patients obtained from the National Surgical Quality Improvement Program. Twenty-seven clinical features were analyzed. We created a volume-matched synthetic cohort of 167,474 patients and a reduced-size synthetic cohort of 5,000 patients. The volume-matched and reduced-size synthetic cohorts were compared against the ground truth data to analyze differences in principal component distribution, underlying statistical properties/associations, intervariable correlations, and machine learning classifier performance when developed on the synthetic data. RESULTS Among 167,474 patients with metastatic cancer in the original data, 50,669 (30.3%) died within 30 days of their index surgery. Our model was able to accurately capture underlying statistical properties, principal components, and intervariable correlations within the ground truth data, yielding an accuracy of 93.2% with a loss of 0.21%, and develop synthetic data capable of training accurate machine learning classifiers. The reduced-size synthetic data accurately replicated all categorical variables and every continuous variable with statistically similar records (P > .05), with the sole exception of preoperative albumin (P < .05). The volume-matched synthetic data frame was able to accurately replicate all categorical variables (P > .05). CONCLUSION This described methodology can be applied to any structured medical data from any setting, significantly expedite scientific analysis/innovation, and be used to develop improved predictive classifiers with boosted tree-based algorithms, serving as the potential new gold standard of medical data sharing and data augmentation.
Collapse
Affiliation(s)
- Reetam Ganguli
- Brown University, Providence, RI
- Dartmouth College, Hanover, NH
| | - Rishik Lad
- Warren Alpert Medical School of Brown University, Providence, RI
| | | | | |
Collapse
|
14
|
Davis SE, Ssemaganda H, Koola JD, Mao J, Westerman D, Speroff T, Govindarajulu US, Ramsay CR, Sedrakyan A, Ohno-Machado L, Resnic FS, Matheny ME. Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance. BMC Med Res Methodol 2023; 23:89. [PMID: 37041457 PMCID: PMC10088292 DOI: 10.1186/s12874-023-01913-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 04/04/2023] [Indexed: 04/13/2023] Open
Abstract
BACKGROUND Validating new algorithms, such as methods to disentangle intrinsic treatment risk from risk associated with experiential learning of novel treatments, often requires knowing the ground truth for data characteristics under investigation. Since the ground truth is inaccessible in real world data, simulation studies using synthetic datasets that mimic complex clinical environments are essential. We describe and evaluate a generalizable framework for injecting hierarchical learning effects within a robust data generation process that incorporates the magnitude of intrinsic risk and accounts for known critical elements in clinical data relationships. METHODS We present a multi-step data generating process with customizable options and flexible modules to support a variety of simulation requirements. Synthetic patients with nonlinear and correlated features are assigned to provider and institution case series. The probability of treatment and outcome assignment are associated with patient features based on user definitions. Risk due to experiential learning by providers and/or institutions when novel treatments are introduced is injected at various speeds and magnitudes. To further reflect real-world complexity, users can request missing values and omitted variables. We illustrate an implementation of our method in a case study using MIMIC-III data for reference patient feature distributions. RESULTS Realized data characteristics in the simulated data reflected specified values. Apparent deviations in treatment effects and feature distributions, though not statistically significant, were most common in small datasets (n < 3000) and attributable to random noise and variability in estimating realized values in small samples. When learning effects were specified, synthetic datasets exhibited changes in the probability of an adverse outcomes as cases accrued for the treatment group impacted by learning and stable probabilities as cases accrued for the treatment group not affected by learning. CONCLUSIONS Our framework extends clinical data simulation techniques beyond generation of patient features to incorporate hierarchical learning effects. This enables the complex simulation studies required to develop and rigorously test algorithms developed to disentangle treatment safety signals from the effects of experiential learning. By supporting such efforts, this work can help identify training opportunities, avoid unwarranted restriction of access to medical advances, and hasten treatment improvements.
Collapse
Affiliation(s)
- Sharon E Davis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA.
| | - Henry Ssemaganda
- Comparative Effectiveness Research Institute, Lahey Hospital and Medical Center, 41 Mall Road, Burlington, MA, 01803, USA
| | - Jejo D Koola
- UC Health Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr. MC 0728, La Jolla, San Diego, CA, 92093-0728, USA
| | - Jialin Mao
- Department of Population Health Sciences, Weill Cornell Medicine, 1300 York Avenue, New York, NY, 10065, USA
| | - Dax Westerman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA
| | - Theodore Speroff
- Departments of Medicine and Biostatistics, Vanderbilt University Medical Center, 1313 21St Avenue South, Oxford House, Room 209, Nashville, TN, 37232, USA
| | - Usha S Govindarajulu
- Center for Biostatistics, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1077, New York, NY, 10029, USA
| | - Craig R Ramsay
- Health Services Research Unit, University of Aberdeen, Health Sciences Building, Foresterhill, 3rd Floor, Aberdeen, AB25 2ZD, UK
| | - Art Sedrakyan
- Department of Population Health Sciences, Weill Cornell Medicine, 1300 York Avenue, New York, NY, 10065, USA
| | - Lucila Ohno-Machado
- Biomedical Informatics and Data Science, Yale School of Medicine, 100 College Street, New Haven, CT, 06510, USA
| | - Frederic S Resnic
- Division of Cardiovascular Medicine and Comparative Effectiveness Research Institute, Lahey Hospital and Medical Center, Tufts University School of Medicine, 41 Burlington Mall Road, Burlington, MA, 01805, USA
| | - Michael E Matheny
- Departments of Biomedical Informatics, Biostatistics, and Medicine, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA
- Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, 1310 24th Avenue South, Nashville, TN, 37212, USA
| |
Collapse
|
15
|
Mosquera L, El Emam K, Ding L, Sharma V, Zhang XH, Kababji SE, Carvalho C, Hamilton B, Palfrey D, Kong L, Jiang B, Eurich DT. A method for generating synthetic longitudinal health data. BMC Med Res Methodol 2023; 23:67. [PMID: 36959532 PMCID: PMC10034254 DOI: 10.1186/s12874-023-01869-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 02/19/2023] [Indexed: 03/25/2023] Open
Abstract
Getting access to administrative health data for research purposes is a difficult and time-consuming process due to increasingly demanding privacy regulations. An alternative method for sharing administrative health data would be to share synthetic datasets where the records do not correspond to real individuals, but the patterns and relationships seen in the data are reproduced. This paper assesses the feasibility of generating synthetic administrative health data using a recurrent deep learning model. Our data comes from 120,000 individuals from Alberta Health's administrative health database. We assess how similar our synthetic data is to the real data using utility assessments that assess the structure and general patterns in the data as well as by recreating a specific analysis in the real data commonly applied to this type of administrative health data. We also assess the privacy risks associated with the use of this synthetic dataset. Generic utility assessments that used Hellinger distance to quantify the difference in distributions between real and synthetic datasets for event types (0.027), attributes (mean 0.0417), Markov transition matrices (order 1 mean absolute difference: 0.0896, sd: 0.159; order 2: mean Hellinger distance 0.2195, sd: 0.2724), the Hellinger distance between the joint distributions was 0.352, and the similarity of random cohorts generated from real and synthetic data had a mean Hellinger distance of 0.3 and mean Euclidean distance of 0.064, indicating small differences between the distributions in the real data and the synthetic data. By applying a realistic analysis to both real and synthetic datasets, Cox regression hazard ratios achieved a mean confidence interval overlap of 68% for adjusted hazard ratios among 5 key outcomes of interest, indicating synthetic data produces similar analytic results to real data. The privacy assessment concluded that the attribution disclosure risk associated with this synthetic dataset was substantially less than the typical 0.09 acceptable risk threshold. Based on these metrics our results show that our synthetic data is suitably similar to the real data and could be shared for research purposes thereby alleviating concerns associated with the sharing of real data in some circumstances.
Collapse
Affiliation(s)
- Lucy Mosquera
- Replica Analytics Ltd, Ottawa, ON, Canada
- Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON, K1J 8L1, Canada
| | - Khaled El Emam
- Replica Analytics Ltd, Ottawa, ON, Canada.
- Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON, K1J 8L1, Canada.
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.
| | - Lei Ding
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, Canada
| | - Vishal Sharma
- School of Public Health, University of Alberta, Edmonton, AB, Canada
| | | | - Samer El Kababji
- Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON, K1J 8L1, Canada
| | | | | | - Dan Palfrey
- Institute of Health Economics, Edmonton, Alberta, Canada
| | - Linglong Kong
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, Canada
| | - Bei Jiang
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, Canada
| | - Dean T Eurich
- School of Public Health, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
16
|
Synthetic data in health care: A narrative review. PLOS DIGITAL HEALTH 2023; 2:e0000082. [PMID: 36812604 PMCID: PMC9931305 DOI: 10.1371/journal.pdig.0000082] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 12/06/2022] [Indexed: 01/09/2023]
Abstract
Data are central to research, public health, and in developing health information technology (IT) systems. Nevertheless, access to most data in health care is tightly controlled, which may limit innovation, development, and efficient implementation of new research, products, services, or systems. Using synthetic data is one of the many innovative ways that can allow organizations to share datasets with broader users. However, only a limited set of literature is available that explores its potentials and applications in health care. In this review paper, we examined existing literature to bridge the gap and highlight the utility of synthetic data in health care. We searched PubMed, Scopus, and Google Scholar to identify peer-reviewed articles, conference papers, reports, and thesis/dissertations articles related to the generation and use of synthetic datasets in health care. The review identified seven use cases of synthetic data in health care: a) simulation and prediction research, b) hypothesis, methods, and algorithm testing, c) epidemiology/public health research, d) health IT development, e) education and training, f) public release of datasets, and g) linking data. The review also identified readily and publicly accessible health care datasets, databases, and sandboxes containing synthetic data with varying degrees of utility for research, education, and software development. The review provided evidence that synthetic data are helpful in different aspects of health care and research. While the original real data remains the preferred choice, synthetic data hold possibilities in bridging data access gaps in research and evidence-based policymaking.
Collapse
|
17
|
Arora A, Arora A. Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset. PLoS One 2023; 18:e0283094. [PMID: 36928534 PMCID: PMC10019654 DOI: 10.1371/journal.pone.0283094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 03/01/2023] [Indexed: 03/18/2023] Open
Abstract
INTRODUCTION The potential for synthetic data to act as a replacement for real data in research has attracted attention in recent months due to the prospect of increasing access to data and overcoming data privacy concerns when sharing data. The field of generative artificial intelligence and synthetic data is still early in its development, with a research gap evidencing that synthetic data can adequately be used to train algorithms that can be used on real data. This study compares the performance of a series machine learning models trained on real data and synthetic data, based on the National Diet and Nutrition Survey (NDNS). METHODS Features identified to be potentially of relevance by directed acyclic graphs were isolated from the NDNS dataset and used to construct synthetic datasets and impute missing data. Recursive feature elimination identified only four variables needed to predict mean arterial blood pressure: age, sex, weight and height. Bayesian generalised linear regression, random forest and neural network models were constructed based on these four variables to predict blood pressure. Models were trained on the real data training set (n = 2408), a synthetic data training set (n = 2408) and larger synthetic data training set (n = 4816) and a combination of the real and synthetic data training set (n = 4816). The same test set (n = 424) was used for each model. RESULTS Synthetic datasets demonstrated a high degree of fidelity with the real dataset. There was no significant difference between the performance of models trained on real, synthetic or combined datasets. Mean average error across all models and all training data ranged from 8.12 To 8.33. This indicates that synthetic data was capable of training equally accurate machine learning models as real data. DISCUSSION Further research is needed on a variety of datasets to confirm the utility of synthetic data to replace the use of potentially identifiable patient data. There is also further urgent research needed into evidencing that synthetic data can truly protect patient privacy against adversarial attempts to re-identify real individuals from the synthetic dataset.
Collapse
Affiliation(s)
- Anmol Arora
- School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
- * E-mail:
| | - Ananya Arora
- School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
| |
Collapse
|
18
|
Braddon AE, Robinson S, Alati R, Betts KS. Exploring the utility of synthetic data to extract more value from sensitive health data assets: A focused example in perinatal epidemiology. Paediatr Perinat Epidemiol 2022; 37:292-300. [PMID: 36482827 DOI: 10.1111/ppe.12942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/09/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022]
Abstract
BACKGROUND Privacy, access and security concerns can hinder the availability of health data for research. The use of synthesised data in place of de-identified electronic health records (EHRs) presents an opportunity to conduct research while minimising privacy concerns. OBJECTIVES To examine whether synthesised data can replicate two prenatal epidemiological associations: between prenatal smoking and lower birthweight, and between prenatal mood disorders and lower birthweight, using data synthesised from de-identified health administrative data collections. METHODS We generated two synthetic datasets, using parametric and non-parametric data generating methods, and examined the synthetic data for evidence of privacy concerns. Next, univariable and multivariable logistic regression was utilised to estimate the associations in both synthetic datasets, with results then compared to the real data. RESULTS Both synthesised datasets performed well in identifying the reduction in birthweight associated with prenatal smoking, while the non-parametric data underestimated the reduction in birthweight associated with prenatal mood disorders. Improbable relationships between some variables were identified in the parametric synthesised data, however, these can be addressed with simple rules during data synthesis. No duplicate rows (i.e., exact copies of de-identified data) were found in the parametric data, while only 0.6% of the rows in the non-parametric data were duplicated. CONCLUSIONS Both synthesised datasets performed well in replicating the statistical properties of the original data while addressing privacy issues. Data synthesis methods provide an opportunity for researchers to utilise health data while managing privacy and security concerns.
Collapse
Affiliation(s)
- Amy Elise Braddon
- School of Population Health, Curtin University, Western Australia, Perth, Australia
| | - Suzanne Robinson
- School of Population Health, Curtin University, Western Australia, Perth, Australia
| | - Rosa Alati
- School of Population Health, Curtin University, Western Australia, Perth, Australia
| | - Kim S Betts
- School of Population Health, Curtin University, Western Australia, Perth, Australia
| |
Collapse
|
19
|
Baumfeld Andre E, Carrington N, Siami FS, Hiatt JC, McWilliams C, Hiller C, Surinach A, Zamorano A, Pashos CL, Schulz WL. The Current Landscape and Emerging Applications for Real-World Data in Diagnostics and Clinical Decision Support and its Impact on Regulatory Decision Making. Clin Pharmacol Ther 2022; 112:1172-1182. [PMID: 35213741 PMCID: PMC9790425 DOI: 10.1002/cpt.2565] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 02/03/2022] [Indexed: 01/31/2023]
Abstract
Real-world data (RWD) and real-world evidence (RWE) are becoming essential tools for informing regulatory decision making in health care and offer an opportunity for all stakeholders in the healthcare ecosystem to evaluate medical products throughout their lifecycle. Although considerable interest has been given to regulatory decisions supported by RWE for treatment authorization, especially in rare diseases, less attention has been given to RWD/RWE related to in vitro diagnostic (IVD) products and clinical decision support systems (CDSS). This review examines current regulatory practices in relation to IVD product development and discusses the use of CDSS in assisting clinicians to retrieve, filter, and analyze patient data in support of complex decisions regarding diagnosis and treatment. The review then explores how utilizing RWD could augment regulatory body understanding of test performance, clinical outcomes, and benefit-risk profiles, and how RWD could be leveraged to augment CDSS and improve safety, quality, and efficiency of healthcare practices. Whereas we present examples of RWD assisting in the regulation of IVDs and CDSS, we also highlight key challenges within the current healthcare system which are impeding the potential of RWE to be fully realized. These challenges include issues such as data availability, reliability, accessibility, harmonization, and interoperability, often for reasons specific to diagnostics. Finally, we review ways that these challenges are actively being addressed and discuss how private-public collaborations and the implementation of standardized language and protocols are working toward producing more robust RWD and RWE to support regulatory decision making.
Collapse
Affiliation(s)
| | | | - Flora S. Siami
- Medical Device Innovation ConsortiumArlingtonVirginiaUSA
| | - Jo Carol Hiatt
- Medical Device Innovation ConsortiumArlingtonVirginiaUSA
| | | | - Carolyn Hiller
- Medical Device Innovation ConsortiumArlingtonVirginiaUSA
| | | | | | | | | |
Collapse
|
20
|
El Emam K, Mosquera L, Fang X. Validating a membership disclosure metric for synthetic health data. JAMIA Open 2022; 5:ooac083. [PMID: 36238080 PMCID: PMC9553223 DOI: 10.1093/jamiaopen/ooac083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND One of the increasingly accepted methods to evaluate the privacy of synthetic data is by measuring the risk of membership disclosure. This is a measure of the F1 accuracy that an adversary would correctly ascertain that a target individual from the same population as the real data is in the dataset used to train the generative model, and is commonly estimated using a data partitioning methodology with a 0.5 partitioning parameter. OBJECTIVE Validate the membership disclosure F1 score, evaluate and improve the parametrization of the partitioning method, and provide a benchmark for its interpretation. MATERIALS AND METHODS We performed a simulated membership disclosure attack on 4 population datasets: an Ontario COVID-19 dataset, a state hospital discharge dataset, a national health survey, and an international COVID-19 behavioral survey. Two generative methods were evaluated: sequential synthesis and a generative adversarial network. A theoretical analysis and a simulation were used to determine the correct partitioning parameter that would give the same F1 score as a ground truth simulated membership disclosure attack. RESULTS The default 0.5 parameter can give quite inaccurate membership disclosure values. The proportion of records from the training dataset in the attack dataset must be equal to the sampling fraction of the real dataset from the population. The approach is demonstrated on 7 clinical trial datasets. CONCLUSIONS Our proposed parameterization, as well as interpretation and generative model training guidance provide a theoretically and empirically grounded basis for evaluating and managing membership disclosure risk for synthetic data.
Collapse
Affiliation(s)
- Khaled El Emam
- Corresponding Author: Khaled El Emam, PhD, Research Institute, Children’s Hospital of Eastern Ontario, 401 Smyth Road, Ottawa, Ontario K1H 8L1, Canada;
| | - Lucy Mosquera
- Data Science, Replica Analytics Ltd., Ottawa, Ontario, Canada,Research Institute, Children’s Hospital of Eastern Ontario, Ottawa, Ontario, Canada
| | - Xi Fang
- Data Science, Replica Analytics Ltd., Ottawa, Ontario, Canada
| |
Collapse
|
21
|
Shi J, Wang D, Tesei G, Norgeot B. Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments. Front Artif Intell 2022; 5:918813. [PMID: 36187323 PMCID: PMC9515575 DOI: 10.3389/frai.2022.918813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 08/15/2022] [Indexed: 12/03/2022] Open
Abstract
In the past decade, there has been exponentially growing interest in the use of observational data collected as a part of routine healthcare practice to determine the effect of a treatment with causal inference models. Validation of these models, however, has been a challenge because the ground truth is unknown: only one treatment-outcome pair for each person can be observed. There have been multiple efforts to fill this void using synthetic data where the ground truth can be generated. However, to date, these datasets have been severely limited in their utility either by being modeled after small non-representative patient populations, being dissimilar to real target populations, or only providing known effects for two cohorts (treated vs. control). In this work, we produced a large-scale and realistic synthetic dataset that provides ground truth effects for over 10 hypertension treatments on blood pressure outcomes. The synthetic dataset was created by modeling a nationwide cohort of more than 580, 000 hypertension patient data including each person's multi-year history of diagnoses, medications, and laboratory values. We designed a data generation process by combining an adapted ADS-GAN model for fictitious patient information generation and a neural network for treatment outcome generation. Wasserstein distance of 0.35 demonstrates that our synthetic data follows a nearly identical joint distribution to the patient cohort used to generate the data. Patient privacy was a primary concern for this study; the ϵ-identifiability metric, which estimates the probability of actual patients being identified, is 0.008%, ensuring that our synthetic data cannot be used to identify any actual patients. To demonstrate its usage, we tested the bias in causal effect estimation of four well-established models using this dataset. The approach we used can be readily extended to other types of diseases in the clinical domain, and to datasets in other domains as well.
Collapse
|
22
|
La Salvia M, Torti E, Leon R, Fabelo H, Ortega S, Martinez-Vega B, Callico GM, Leporati F. Deep Convolutional Generative Adversarial Networks to Enhance Artificial Intelligence in Healthcare: A Skin Cancer Application. SENSORS (BASEL, SWITZERLAND) 2022; 22:6145. [PMID: 36015906 PMCID: PMC9416026 DOI: 10.3390/s22166145] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/04/2022] [Accepted: 08/14/2022] [Indexed: 06/15/2023]
Abstract
In recent years, researchers designed several artificial intelligence solutions for healthcare applications, which usually evolved into functional solutions for clinical practice. Furthermore, deep learning (DL) methods are well-suited to process the broad amounts of data acquired by wearable devices, smartphones, and other sensors employed in different medical domains. Conceived to serve the role of diagnostic tool and surgical guidance, hyperspectral images emerged as a non-contact, non-ionizing, and label-free technology. However, the lack of large datasets to efficiently train the models limits DL applications in the medical field. Hence, its usage with hyperspectral images is still at an early stage. We propose a deep convolutional generative adversarial network to generate synthetic hyperspectral images of epidermal lesions, targeting skin cancer diagnosis, and overcome small-sized datasets challenges to train DL architectures. Experimental results show the effectiveness of the proposed framework, capable of generating synthetic data to train DL classifiers.
Collapse
Affiliation(s)
- Marco La Salvia
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy
| | - Emanuele Torti
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy
| | - Raquel Leon
- Research Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35001 Las Palmas de Gran Canaria, Spain
| | - Himar Fabelo
- Research Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35001 Las Palmas de Gran Canaria, Spain
| | - Samuel Ortega
- Research Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35001 Las Palmas de Gran Canaria, Spain
- Norwegian Institute of Food, Fisheries and Aquaculture Research (Nofima), 6122 Tromsø, Norway
| | - Beatriz Martinez-Vega
- Research Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35001 Las Palmas de Gran Canaria, Spain
| | - Gustavo M. Callico
- Research Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria (ULPGC), 35001 Las Palmas de Gran Canaria, Spain
| | - Francesco Leporati
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, 27100 Pavia, Italy
| |
Collapse
|
23
|
Door to balloon time in primary percutaneous coronary intervention in ST elevation myocardial infarction: every minute counts. Coron Artery Dis 2022; 33:341-348. [PMID: 35880558 DOI: 10.1097/mca.0000000000001145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
OBJECTIVES This study examines relationships between door to balloon (D2B) time and subsequent admissions due to heart failure (HF), acute coronary syndrome (ACS), and mortality for up to 1 year. BACKGROUND Current guidelines set 90-min for D2B time for primary percutaneous coronary intervention (PPCI) as a goal, which has been shown to reduce mortality and adverse events. METHODS Using the MDclone ADAMS system integrated with our electronic medical records, we conducted retrospective analysis of all patients admitted due to ST-elevation myocardial infarction from home, without any history of HF or coronary disease, and who underwent PPCI during 2013-2019. Data on D2B time, baseline clinical and demographic characteristics, and outcomes of HF, ACS and mortality were collected. Adjusted HR for each of the outcomes was calculated by multivariate Cox model. RESULTS A total of 826 patients were included in the final analysis. D2B had no significant effect on incidence of heart failure admissions for up to 1-year follow-up. D2B had a significant effect on mortality at 180 days, showing a 30% increase for each 30-min increase (HR 1.308; CI, 1.046-1.635) as for ACS at 90 days (HR 1.307; 1.025-1.638). The 30-min D2B cutoff showed a significant increase in ACS recurrence throughout the follow-up period at 90 days (HR 2.871, 1.239-6.648), 180 days (HR 2.607, 1.255-5.413), and 1 year (HR 1.886, 1.073-3.317). CONCLUSIONS Patients with shorter D2B times had significantly reduced mortality and recurrence of ACS, with no effect on heart failure admission incidence.
Collapse
|
24
|
Thomas JA, Foraker RE, Zamstein N, Morrow JD, Payne PRO, Wilcox AB. Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C). J Am Med Inform Assoc 2022; 29:1350-1365. [PMID: 35357487 PMCID: PMC8992357 DOI: 10.1093/jamia/ocac045] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 03/11/2022] [Accepted: 03/28/2022] [Indexed: 11/16/2022] Open
Abstract
OBJECTIVE This study sought to evaluate whether synthetic data derived from a national coronavirus disease 2019 (COVID-19) dataset could be used for geospatial and temporal epidemic analyses. MATERIALS AND METHODS Using an original dataset (n = 1 854 968 severe acute respiratory syndrome coronavirus 2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip code-level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity between the data was statistically and qualitatively evaluated. RESULTS In general, synthetic data closely matched original data for epidemic curves, patient characteristics, and outcomes. Synthetic data suppressed labels of zip codes with few total tests (mean = 2.9 ± 2.4; max = 16 tests; 66% reduction of unique zip codes). Epidemic curves and monthly indicator counts were similar between synthetic and original data in a random sample of the most tested (top 1%; n = 171) and for all unsuppressed zip codes (n = 5819), respectively. In small sample sizes, synthetic data utility was notably decreased. DISCUSSION Analyses on the population-level and of densely tested zip codes (which contained most of the data) were similar between original and synthetically derived datasets. Analyses of sparsely tested populations were less similar and had more data suppression. CONCLUSION In general, synthetic data were successfully used to analyze geospatial and temporal trends. Analyses using small sample sizes or populations were limited, in part due to purposeful data label suppression-an attribute disclosure countermeasure. Users should consider data fitness for use in these cases.
Collapse
Affiliation(s)
- Jason A Thomas
- Corresponding Author: Jason A. Thomas, PhD, Philips North America, LLC, 22100 Bothell Everett Hwy, Bothell, WA 98021, USA;
| | - Randi E Foraker
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA,School of Medicine, Institute for Informatics, Washington University in St. Louis, St. Louis, Missouri, USA
| | | | - Jon D Morrow
- MDClone Ltd., Be’er Sheva, Israel,Department of Obstetrics and Gynecology, New York University Grossman School of Medicine, New York, New York, USA
| | - Philip R O Payne
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA,School of Medicine, Institute for Informatics, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Adam B Wilcox
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA,School of Medicine, Institute for Informatics, Washington University in St. Louis, St. Louis, Missouri, USA
| | | |
Collapse
|
25
|
The Association Between Opioid Use and Opioid Type and the Clinical Course and Outcomes of Acute Pancreatitis. Pancreas 2022; 51:523-530. [PMID: 35835104 DOI: 10.1097/mpa.0000000000002052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
OBJECTIVES Basic science studies suggest that opioids aggravate disease severity and outcomes in acute pancreatitis. We sought to determine the association of opioid use and opioid type with the clinical course and outcome of acute pancreatitis. METHODS In this retrospective single-center observational study, we included all adult patients admitted with acute pancreatitis between 2008 and 2021. Patients were classified into 3 groups based on analgesia type: morphine, noonmorphine opioid, and nonopioid. RESULTS We included 2308 patients. Of the patients, 343 (14.9%) were treated with morphine, 733 (31.8%) were treated with nonmorphine opioids, and 1232 (53.4%) patients were in the nonopioid group. The incidence of 30-day mortality did not differ significantly between study groups: 3.9%, 2.9%, and 4.4% in the nonopioid, nonmorphine-opioid, and morphine groups, respectively ( P = 0.366).In multivariate analysis, the composite end point consisting of 30-day mortality, invasive ventilation, emergent abdominal surgery, and need for vasopressors was significantly more likely to occur in the morphine group than in the nonopioid group (adjusted odds ratio, 1.69; 95% confidence interval, 1.1-2.598; P = 0.01). CONCLUSIONS Mortality among acute pancreatitis patients did not differ significantly between patients receiving morphine, nonmorphine opioids, and nonopioids. However, morphine treatment was associated with higher rates of some serious adverse events.
Collapse
|
26
|
Brzezinski RY, Melloul A, Berliner S, Goldiner I, Stark M, Rogowski O, Banai S, Shenhar-Tsarfaty S, Shacham Y. Early Detection of Inflammation-Prone STEMI Patients Using the CRP Troponin Test (CTT). J Clin Med 2022; 11:jcm11092453. [PMID: 35566579 PMCID: PMC9105044 DOI: 10.3390/jcm11092453] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/21/2022] [Accepted: 04/25/2022] [Indexed: 02/01/2023] Open
Abstract
Elevated concentrations of C-reactive protein (CRP) early during an acute coronary syndrome (ACS) may reflect the magnitude of the inflammatory response to myocardial damage and are associated with worse outcome. However, the routine measurement of both CRP and cardiac troponin simultaneously in the setting of ST-segment myocardial infarction (STEMI) is not used broadly. Here, we sought to identify and characterize individuals who are prone to an elevated inflammatory response following STEMI by using a combined CRP and troponin test (CTT) and determine their short- and long-term outcome. We retrospectively examined 1186 patients with the diagnosis of acute STEMI, who had at least two successive measurements of combined CRP and cardiac troponin (up to 6 h apart), all within the first 48 h of admission. We used Chi-Square Automatic Interaction Detector (CHAID) tree analysis to determine which parameters, timing (baseline vs. serial measurements), and cut-offs should be used to predict mortality. Patients with high CRP concentrations (above 90th percentile, >33 mg/L) had higher 30 day and all-cause mortality rates compared to the rest of the cohort, regardless of their troponin test status (above or below 118,000 ng/L); 14.4% vs. 2.7%, p < 0.01. Furthermore, patients with both high CRP and high troponin levels on their second measurement had the highest 30-day mortality rates compared to the rest of the cohort; 21.4% vs. 3.7%, p < 0.01. These patients also had the highest all-cause mortality rates after a median follow-up of 4.5 years compared to the rest of the cohort; 42.9% vs. 12.7%, p < 0.01. In conclusion, serial measurements of both CRP and cardiac troponin might detect patients at increased risk for short-and long-term mortality following STEMI. We suggest the future use of the combined CTT as a potential early marker for inflammatory-prone patients with worse outcomes following ACS. This sub-type of patients might benefit from early anti-inflammatory therapy such as colchicine and anti-interleukin-1ß agents.
Collapse
Affiliation(s)
- Rafael Y. Brzezinski
- Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
| | - Ariel Melloul
- Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
| | - Shlomo Berliner
- Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
| | - Ilana Goldiner
- Department of Clinical Laboratories, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (I.G.); (M.S.)
| | - Moshe Stark
- Department of Clinical Laboratories, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (I.G.); (M.S.)
| | - Ori Rogowski
- Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
| | - Shmuel Banai
- Department of Cardiology, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel;
| | - Shani Shenhar-Tsarfaty
- Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
| | - Yacov Shacham
- Department of Cardiology, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel;
- Correspondence:
| |
Collapse
|
27
|
The “Coherent Data Set”: Combining Patient Data and Imaging in a Comprehensive, Synthetic Health Record. ELECTRONICS 2022. [DOI: 10.3390/electronics11081199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
The “Coherent Data Set” is a novel synthetic data set that leverages structured data from Synthea™ to create a longitudinal, “coherent” patient-level electronic health record (EHR). Comprised of synthetic patients, the Coherent Data Set is publicly available, reproducible using Synthea™, and free of the privacy risks that arise from using real patient data. The Coherent Data Set provides complex and representative health records that can be leveraged by health IT professionals without the risks associated with de-identified patient data. It includes familial genomes that were created through a simulation of the genetic reproduction process; magnetic resonance imaging (MRI) DICOM files created with a voxel-based computational model; clinical notes in the style of traditional subjective, objective, assessment, and plan notes; and physiological data that leverage existing System Biology Markup Language (SBML) models to capture non-linear changes in patient health metrics. HL7 Fast Healthcare Interoperability Resources (FHIR®) links the data together. The models can generate clinically logical health data, but ensuring clinical validity remains a challenge without comparable data to substantiate results. We believe this data set is the first of its kind and a novel contribution to practical health interoperability efforts.
Collapse
|
28
|
El Emam K, Mosquera L, Fang X, El-Hussuna A. Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study. JMIR Med Inform 2022; 10:e35734. [PMID: 35389366 PMCID: PMC9030990 DOI: 10.2196/35734] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 01/27/2022] [Accepted: 02/13/2022] [Indexed: 01/06/2023] Open
Abstract
Background A regular task by developers and users of synthetic data generation (SDG) methods is to evaluate and compare the utility of these methods. Multiple utility metrics have been proposed and used to evaluate synthetic data. However, they have not been validated in general or for comparing SDG methods. Objective This study evaluates the ability of common utility metrics to rank SDG methods according to performance on a specific analytic workload. The workload of interest is the use of synthetic data for logistic regression prediction models, which is a very frequent workload in health research. Methods We evaluated 6 utility metrics on 30 different health data sets and 3 different SDG methods (a Bayesian network, a Generative Adversarial Network, and sequential tree synthesis). These metrics were computed by averaging across 20 synthetic data sets from the same generative model. The metrics were then tested on their ability to rank the SDG methods based on prediction performance. Prediction performance was defined as the difference between each of the area under the receiver operating characteristic curve and area under the precision-recall curve values on synthetic data logistic regression prediction models versus real data models. Results The utility metric best able to rank SDG methods was the multivariate Hellinger distance based on a Gaussian copula representation of real and synthetic joint distributions. Conclusions This study has validated a generative model utility metric, the multivariate Hellinger distance, which can be used to reliably rank competing SDG methods on the same data set. The Hellinger distance metric can be used to evaluate and compare alternate SDG methods.
Collapse
Affiliation(s)
- Khaled El Emam
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada.,Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada.,Replica Analytics Ltd, Ottawa, ON, Canada
| | - Lucy Mosquera
- Children's Hospital of Eastern Ontario Research Institute, Ottawa, ON, Canada.,Replica Analytics Ltd, Ottawa, ON, Canada
| | - Xi Fang
- Replica Analytics Ltd, Ottawa, ON, Canada
| | | |
Collapse
|
29
|
Bahouth F, Elias A, Ghersin I, Khoury E, Bar O, Sholy H, Khoury J, Azzam ZS. The prognostic value of heart rate at discharge in acute decompensation of heart failure with reduced ejection fraction. ESC Heart Fail 2022; 9:585-594. [PMID: 34821080 PMCID: PMC8788061 DOI: 10.1002/ehf2.13710] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 09/30/2021] [Accepted: 10/31/2021] [Indexed: 11/26/2022] Open
Abstract
AIMS The effect of elevated heart rate (HR) on morbidity and mortality is evident in chronic stable heart failure; data in this regard in acute decompensated heart failure (ADHF) setting are scarce. In this single-centre study, we sought to address the prognostic value of HR and beta-blocker dosage at discharge on all-cause mortality among patients with heart failure and reduced ejection fraction and ADHF. METHODS AND RESULTS In this retrospective observational study, 2945 patients were admitted for the first time with the primary diagnosis of ADHF between January 2008 and February 2018. Patients were divided by resting HR at discharge into three groups (HR < 70 b.p.m., HR 70-90 b.p.m., and HR > 90 b.p.m.). Evidence-based beta-blockers were defined as metoprolol, bisoprolol, and carvedilol. The doses of prescribed beta-blockers were calculated into a percentage target dose of each beta-blocker and divided to four quartiles: 0 < Dose ≤ 25%, 25% < Dose ≤ 50%, 50% < Dose ≤ 75%, and >75% of the target dose. Cox regression was used to calculate the hazard ratio for various HR categories and adjusting for clinical and laboratory variables. At discharge, 1226 patients had an HR < 70 b.p.m., 1347 patients had an HR at range 70-90 b.p.m., and 372 patients with an HR > 90 b.p.m. The 30 day mortality rate was 2.2%, 3.7%, and 12.1% (P < 0.001), respectively. Concordantly, 1 year mortality rate was 14.6%, 16.7%, and 30.4% (P < 0.001) among patients with HR < 70 b.p.m., HR 70-90 b.p.m., and HR > 90 b.p.m., respectively. The adjusted hazard ratio was significantly increased only in HR above 90 b.p.m. category (hazard ratio, 2.318; 95% confidence interval, 1.794-2.996). CONCLUSIONS Patients with ADHF and an HR of <90 b.p.m. at discharge had significantly a lower 1 year mortality independent of the dosage of beta-blocker at discharge. It is conceivable to discharge these patients with lower HR.
Collapse
Affiliation(s)
- Fadel Bahouth
- Departments of Internal Medicine “B” and “H”Rambam Health Care CampusHaifaIsrael
- Heart InstituteBnei Zion Medical CenterHaifaIsrael
| | - Adi Elias
- Departments of Internal Medicine “B” and “H”Rambam Health Care CampusHaifaIsrael
| | - Itai Ghersin
- Departments of Internal Medicine “B” and “H”Rambam Health Care CampusHaifaIsrael
| | - Emad Khoury
- Departments of Internal Medicine “B” and “H”Rambam Health Care CampusHaifaIsrael
- Rappaport Faculty of MedicineTechnion, Israel Institute of TechnologyHaifaIsrael
| | - Omer Bar
- Departments of Internal Medicine “B” and “H”Rambam Health Care CampusHaifaIsrael
| | | | - Johad Khoury
- Pulmonology DivisionLady Davis Carmel Medical CenterHaifaIsrael
| | - Zaher S. Azzam
- Departments of Internal Medicine “B” and “H”Rambam Health Care CampusHaifaIsrael
- Rappaport Faculty of MedicineTechnion, Israel Institute of TechnologyHaifaIsrael
| |
Collapse
|
30
|
Borreda I, Zukermann R, Epstein D, Marcusohn E. IV Sodium Ferric Gluconate Complex in Patients Hospitalized Due to Acute Decompensated Heart Failure and Iron Deficiency. J Cardiovasc Pharmacol Ther 2022; 27:10742484211055639. [PMID: 34994220 DOI: 10.1177/10742484211055639] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Background: Patients suffering from heart failure (HF) and iron deficiency (ID) have worse outcomes. Treatment with intra-venous (IV) ferric carboxymaltose has been shown to reduce HF rehospitalizations and to improve functional capacity and symptoms in patients with HF and reduced ejection fraction (HFrEF). However, IV ferric carboxymaltose is significantly more expensive than IV sodium ferric gluconate complex (SFGC) limiting its availability to most HF patients around the globe. Methods: A retrospective analysis comparing patients admitted to internal medicine or cardiology departments between January 2013 to December 2018 due to acute decompensated HF (ADHF) and treated with or without IV SFGC on top of standard medical therapy. Results: During the study period, a total of 1863 patients were hospitalized due to ADHF with either HFrEF or HF with preserved ejection fraction (HFpEF). Among them, 840 patients had laboratory evidence of iron deficiency (absolute or functional) and met the inclusion criteria. One hundred twenty-two of them (14.5%) were treated with IV SFGC during the index hospitalization. Patients treated with IV iron were more likely to have history of ischemic heart disease, atrial fibrillation, and chronic kidney disease. The rate of readmissions due to ADHF was similar between the groups at 30 days, 3 months, and 1 year. Conclusion: High risk patient hospitalized to ADHF and treated with IV SFGC showed comparable ADHF readmission rates, compared to those who did not receive iron supplementation.
Collapse
Affiliation(s)
- Itay Borreda
- Internal Medicine H, 58878Rambam Health Care Campus, Haifa, Israel
| | - Robert Zukermann
- Intermediate Cardiac Care Unit, 58878Department of Cardiology, Rambam Health Care Campus, Haifa, Israel
| | - Danny Epstein
- Critical Care Division, 58878Rambam Health Care Campus, Haifa, Israel
| | - Erez Marcusohn
- 58878Department of Cardiology, Rambam Health Care Campus, Haifa, Israel
| |
Collapse
|
31
|
Nakhleh A, Saiegh L, Shehadeh N, Weintrob N, Sheikh-Ahmad M, Supino-Rosin L, Alboim S, Gendelman R, Zloczower M. Screening for non-classic congenital adrenal hyperplasia in women: New insights using different immunoassays. Front Endocrinol (Lausanne) 2022; 13:1048663. [PMID: 36704043 PMCID: PMC9871807 DOI: 10.3389/fendo.2022.1048663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 12/19/2022] [Indexed: 01/11/2023] Open
Abstract
CONTEXT The 250µg-cosyntropin stimulation test (CST) is used to diagnose non-classic congenital adrenal hyperplasia (NCCAH). The current recommendation is to perform CST when follicular 17-hydroxyprogesterone (17OHP) is 6-30 nmol/L, a cutoff derived from radioimmunoassay (RIA). Recently, enzyme-linked immunosorbent assay (ELISA) has replaced RIA. OBJECTIVES We aimed to (1) determine the RIA and ELISA-based 17OHP cutoffs at which CST should be performed, (2) identify predictors of NCCAH. METHODS A retrospective study at an Israeli Health Maintenance Organization. Data were retrieved from women with suspected NCCAH, referred for CST during 2001-2020. NCCAH was defined as a stimulated 17OHP >30 nmol/L. Serum 17OHP levels were assayed by RIA from 1/2000-3/2015, and by ELISA from 4/2015-12/2020. ROC curves were generated and optimal 17OHP thresholds were determined. Multivariate analysis was performed. RESULTS CST was performed in 2409 women (1564 in RIA, 845 in ELISA). NCCAH was diagnosed in 4.7% of the RIA group and 7.5% of the ELISA group. The optimal basal 17OHP cutoff values predicting NCCAH were 6.1 nmol/L in RIA (sensitivity=93.2%, specificity=91.7%) and 8.2 nmol/L in ELISA (sensitivity=93.7%, specificity=92.3%). In multivariate analysis, higher basal 17OHP, lower LH: FSH ratio, and oligomenorrhea were predictors of NCCAH in RIA. Higher basal 17OHP, androstenedione, and total testosterone were predictors of NCCAH in ELISA. A lower LH: FSH ratio showed similar trend in ELISA. CONCLUSIONS Optimal RIA-based basal 17OHP cutoff was comparable with that recommended in guidelines. The results suggest adopting a higher 17OHP cutoff when using ELISA. LH : FSH ratio improves the negative predictive value of basal 17OHP.
Collapse
Affiliation(s)
- Afif Nakhleh
- Institute of Endocrinology, Diabetes and Metabolism, Rambam Health Care Campus, Haifa, Israel
- Diabetes and Endocrinology Clinic, Maccabi Healthcare Services, Haifa, Israel
- *Correspondence: Afif Nakhleh,
| | - Leonard Saiegh
- Ruth & Bruce Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel
- Department of Endocrinology, Bnai Zion Medical Center, Haifa, Israel
| | - Naim Shehadeh
- Institute of Endocrinology, Diabetes and Metabolism, Rambam Health Care Campus, Haifa, Israel
- Diabetes and Endocrinology Clinic, Maccabi Healthcare Services, Haifa, Israel
- Ruth & Bruce Rappaport Faculty of Medicine, Technion, Israel Institute of Technology, Haifa, Israel
| | - Naomi Weintrob
- Department of Pediatrics, Sackler Faculty of Medicine, Tel Aviv University, Tel-Aviv, Israel
- Pediatric Endocrinology and Diabetes Unit, Dana-Dwek Children’s Hospital, Tel-Aviv Medical Center, Tel-Aviv, Israel
| | | | - Lia Supino-Rosin
- Central Laboratory, Maccabi Healthcare Services, Rehovot, Israel
| | - Sandra Alboim
- Central Laboratory, Maccabi Healthcare Services, Rehovot, Israel
| | - Raya Gendelman
- The Endocrine Laboratory, Rambam Health Care Campus, Haifa, Israel
| | - Moshe Zloczower
- Institute of Endocrinology, Diabetes and Metabolism, Rambam Health Care Campus, Haifa, Israel
| |
Collapse
|
32
|
Gorelik Y, Bloch-Isenberg N, Hashoul S, Heyman SN, Khamaisi M. Hyperglycemia on Admission Predicts Acute Kidney Failure and Renal Functional Recovery among Inpatients. J Clin Med 2021; 11:jcm11010054. [PMID: 35011805 PMCID: PMC8745405 DOI: 10.3390/jcm11010054] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 12/18/2021] [Accepted: 12/19/2021] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Hyperglycemia is associated with adverse outcomes in hospitalized patients. We aimed to assess the impact of glucose levels upon admission on the subsequent deterioration or improvement of kidney function in inpatients with a focus on diabetes or reduced baseline kidney function as possible modifiers of this effect. METHODS Running a retrospective cohort analysis, we compared patients with normal vs. high glucose levels upon admission. We applied multivariable logistic regression models to study the association between baseline glucose levels with subsequent renal and clinical outcomes. Interaction terms were used to study a possible modifier effect of diabetes. RESULTS Among 95,556 inpatients (52% males, mean age 61 years), 15,675 (16.5%) had plasma glucose higher than 180 mg/dL, and 72% of them were diabetics. Patients with higher glucose at presentation were older, with a higher proportion of co-morbid conditions. Rates of acute kidney injury (AKI), acute kidney functional recovery (AKR), and mortality were proportional to reduced renal function. AKI, AKR, and mortality were almost doubled in patients with high baseline glucose upon admission. Multivariable analysis with interaction terms demonstrated an increasing adjusted probability of all events as glucose increased, yet this association was observed principally in non-diabetic patients. CONCLUSIONS Hyperglycemia is associated with AKI, AKR, and mortality in non-diabetic inpatients in proportion to the severity of their acute illness. This association diminishes in diabetic patients, suggesting a possible impact of treatable and easily reversible renal derangement in this population.
Collapse
Affiliation(s)
- Yuri Gorelik
- Department of Medicine D, Rambam Health Care Campus, Haifa 3109601, Israel; (Y.G.); (N.B.-I.); (M.K.)
- Department of Medicine A, Ruth & Bruce Rappaport Faculty of Medicine, Technion-IIT, Haifa 3109601, Israel;
| | - Natalie Bloch-Isenberg
- Department of Medicine D, Rambam Health Care Campus, Haifa 3109601, Israel; (Y.G.); (N.B.-I.); (M.K.)
- Department of Medicine A, Ruth & Bruce Rappaport Faculty of Medicine, Technion-IIT, Haifa 3109601, Israel;
| | - Siwar Hashoul
- Department of Medicine A, Ruth & Bruce Rappaport Faculty of Medicine, Technion-IIT, Haifa 3109601, Israel;
- Department of Medicine A, Rambam Health Care Campus, Haifa 3109601, Israel
| | - Samuel N. Heyman
- Department of Medicine, Hadassah Hebrew University Hospital, Mt. Scopus, Jerusalem 91240, Israel
- Correspondence:
| | - Mogher Khamaisi
- Department of Medicine D, Rambam Health Care Campus, Haifa 3109601, Israel; (Y.G.); (N.B.-I.); (M.K.)
- Department of Medicine A, Ruth & Bruce Rappaport Faculty of Medicine, Technion-IIT, Haifa 3109601, Israel;
| |
Collapse
|
33
|
Marcusohn E, Gibory I, Miller A, Lipsky AM, Neuberger A, Epstein D. The association between the degree of fever as measured in the emergency department and clinical outcomes of hospitalized adult patients. Am J Emerg Med 2021; 52:92-98. [PMID: 34894473 DOI: 10.1016/j.ajem.2021.11.045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/21/2021] [Accepted: 11/29/2021] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Fever is a physiologic response to a wide range of pathologies and one of the most common complaints and clinical signs in the emergency medicine department (ED). The association between fever magnitude and clinical outcomes has been evaluated in specific populations with inconsistent results. OBJECTIVES In this study we aimed to investigate the association between the degree of fever in the ED and clinical outcomes of hospitalized febrile adult patients. METHODS This was a retrospective single-center cohort study of all the patients with maximal body temperature (BT) ≥ 38.0 °C, as recorded during the ED evaluation, who were hospitalized between January 2015 and December 2020. Patients with heatstroke were excluded. The primary outcome was 30-day all-cause mortality and secondary outcomes were intensive care unit (ICU) admission and development of acute kidney injury (AKI). RESULTS Fever was recorded among 8.1% of patients evaluated in the ED. Elevated BT was associated with increased risk of hospital admission (70.3% vs. 49.4%, p < 0.001), 30-day mortality (12.3% vs. 2.6%, p < 0.001), ICU admission (5.7% vs. 2.8%, p < 0.001), and AKI 11.7% vs. 3.8%, p < 0.001). After exclusion of nine patients with heatstroke, 21,252 hospitalized febrile patients were included in the final analysis. BT > 39.7 °C was progressively associated with increased mortality (OR 1.64-2.22, 95% CI 1.16-2.81, p < 0.005) as compared to BT 38.0-38.1 °C. More AKI events were observed in patients with BT > 39.5 °C (OR 1.48-2.91, 95% CI 1.11-3.66, p < 0.007). Temperature between 39.2 and 39.5 °C was associated with lower mortality (OR 0.62-0.71, 95% CI 0.51-0.87, p < 0.001). In a multiple logistic regression analysis BT > 39.9 °C was independently associated with increased mortality and AKI. BT > 39.7 °C was progressively associated with an increased risk of ICU admission. CONCLUSION Among febrile patients admitted to the hospital, BT > 39.5 °C was associated with adverse clinical course, as compared to patients with lower-grade fever (38.0-38.1 °C). These patients should be flagged on arrival to the ED and likely warrant more aggressive evaluation and treatment.
Collapse
Affiliation(s)
- Erez Marcusohn
- Department of Cardiology, Rambam Health Care Campus, Haifa, Israel.
| | - Iftach Gibory
- Internal Medicine "H" department, Rambam Health Care Campus, Haifa, Israel
| | - Asaf Miller
- Medical Intensive Care unit, Rambam Health Care Campus, Haifa, Israel
| | - Ari M Lipsky
- Emergency Department, Emek Medical Center, Afula, Israel
| | - Ami Neuberger
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel; Internal Medicine "B" department, Rambam Health Care Campus, Haifa, Israel
| | - Danny Epstein
- Critical Care Division, Rambam Health Care Campus, Haifa, Israel
| |
Collapse
|
34
|
Foraker R, Guo A, Thomas J, Zamstein N, Payne PR, Wilcox A. The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data. J Med Internet Res 2021; 23:e30697. [PMID: 34559671 PMCID: PMC8491642 DOI: 10.2196/30697] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/24/2021] [Accepted: 09/12/2021] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Computationally derived ("synthetic") data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic. OBJECTIVE We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes. METHODS We used the National COVID Cohort Collaborative's instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19-positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19-related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data. RESULTS For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts. CONCLUSIONS This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights.
Collapse
Affiliation(s)
- Randi Foraker
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, MO, United States
- Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, MO, United States
| | - Aixia Guo
- Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, MO, United States
| | - Jason Thomas
- Department of Biomedical and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| | | | - Philip Ro Payne
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, MO, United States
- Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, MO, United States
| | - Adam Wilcox
- Department of Biomedical and Medical Education, School of Medicine, University of Washington, Seattle, WA, United States
| |
Collapse
|
35
|
Korytny A, Klein A, Marcusohn E, Freund Y, Neuberger A, Raz A, Miller A, Epstein D. Hypocalcemia is associated with adverse clinical course in patients with upper gastrointestinal bleeding. Intern Emerg Med 2021; 16:1813-1822. [PMID: 33651325 DOI: 10.1007/s11739-021-02671-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 02/11/2021] [Indexed: 12/26/2022]
Abstract
Acute non-variceal upper gastrointestinal bleeding (NV-UGIB) is associated with significant morbidity and mortality. Early and efficient risk stratification can facilitate management and improve outcomes. We aimed to determine whether the level of ionized calcium (Ca++), an essential co-factor in the coagulation cascade, is associated with the severity of bleeding and the need for advanced interventions among these patients. This was a retrospective single-center cohort study of all patients admitted due to NV-UGIB. The primary outcome was transfusion of ≥ 2 packed red blood cells, arterial embolization, or emergency surgery. Secondary outcomes included (1) transfusion of ≥ 2 packed red blood cells, (2) arterial embolization, or emergency surgery, and (3) all-cause in-hospital mortality. Multivariable logistic regression was performed to determine whether Ca++ was an independent predictor of these adverse outcomes. 1345 patients were included. Hypocalcemia was recorded in 604 (44.9%) patients. The rates of primary adverse outcome were significantly higher in the hypocalcemic group, 14.4% vs. 5.1%, p < 0.001. Secondary outcomes-multiple transfusions, need for angiography or surgery, and mortality were also increased (9.9% vs. 2.3%, p < 0.001, 5.3% vs. 2.8%, p = 0.03, and 33.3% vs. 24.7%, p < 0.001, respectively). Hypocalcemia was an independent predictor of primary and all the secondary outcomes, except mortality. Hypocalcemia in high-risk hospitalized patients with NV-UGIB is common and independently associated with adverse outcomes. Ca++ monitoring in this population may facilitate the rapid identification of high-risk patients. Trials are needed to assess whether correction of hypocalcemia will lead to improved outcomes.
Collapse
Affiliation(s)
- Alexander Korytny
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
| | - Amir Klein
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
| | - Erez Marcusohn
- Department of Cardiology, Rambam Health Care Campus, Haifa, Israel
| | - Yaacov Freund
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
| | - Ami Neuberger
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
- Infectious Diseases Unit, Rambam Health Care Campus, Haifa, Israel
- Department of Internal Medicine "B", Rambam Health Care Campus, Haifa, Israel
| | - Aeyal Raz
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
- Department of Anesthesiology, Rambam Health Care Campus, Haifa, Israel
| | - Asaf Miller
- Medical Intensive Care Unit, Rambam Health Care Campus, Haifa, Israel
| | - Danny Epstein
- Critical Care Division, Rambam Health Care Campus, HaAliya HaShniya St. 8, 3109601, Haifa, Israel.
| |
Collapse
|
36
|
Brzezinski RY, Rabin N, Lewis N, Peled R, Kerpel A, Tsur AM, Gendelman O, Naftali-Shani N, Gringauz I, Amital H, Leibowitz A, Mayan H, Ben-Zvi I, Heller E, Shechtman L, Rogowski O, Shenhar-Tsarfaty S, Konen E, Marom EM, Ironi A, Rahav G, Zimmer Y, Grossman E, Ovadia-Blechman Z, Leor J, Hoffer O. Automated processing of thermal imaging to detect COVID-19. Sci Rep 2021; 11:17489. [PMID: 34471180 PMCID: PMC8410809 DOI: 10.1038/s41598-021-96900-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 08/17/2021] [Indexed: 01/08/2023] Open
Abstract
Rapid and sensitive screening tools for SARS-CoV-2 infection are essential to limit the spread of COVID-19 and to properly allocate national resources. Here, we developed a new point-of-care, non-contact thermal imaging tool to detect COVID-19, based on advanced image processing algorithms. We captured thermal images of the backs of individuals with and without COVID-19 using a portable thermal camera that connects directly to smartphones. Our novel image processing algorithms automatically extracted multiple texture and shape features of the thermal images and achieved an area under the curve (AUC) of 0.85 in COVID-19 detection with up to 92% sensitivity. Thermal imaging scores were inversely correlated with clinical variables associated with COVID-19 disease progression. In summary, we show, for the first time, that a hand-held thermal imaging device can be used to detect COVID-19. Non-invasive thermal imaging could be used to screen for COVID-19 in out-of-hospital settings, especially in low-income regions with limited imaging resources.
Collapse
Affiliation(s)
- Rafael Y Brzezinski
- Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
| | - Neta Rabin
- Faculty of Engineering, Tel-Aviv University, Tel Aviv, Israel
| | - Nir Lewis
- Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
| | - Racheli Peled
- Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
| | - Ariel Kerpel
- Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Avishai M Tsur
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
- Israel Defense Forces, Medical Corps, Ramat Gan, Israel
| | - Omer Gendelman
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
| | - Nili Naftali-Shani
- Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
| | - Irina Gringauz
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Geriatrics Division, Sheba Medical Center, Tel Hashomer, Israel
| | - Howard Amital
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
| | - Avshalom Leibowitz
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
| | - Haim Mayan
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
| | - Ilan Ben-Zvi
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
| | - Eyal Heller
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
| | - Liran Shechtman
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
| | - Ori Rogowski
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine C, D, and E, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Shani Shenhar-Tsarfaty
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine C, D, and E, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
| | - Eli Konen
- Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Edith M Marom
- Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Israel
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Avinoah Ironi
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Department of Emergency Medicine, Sheba Medical Center, Tel Hashomer, Israel
| | - Galia Rahav
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Infectious Disease Unit, Sheba Medical Center, Tel Hashomer, Israel
| | - Yair Zimmer
- School of Medical Engineering, Afeka Tel Aviv Academic College of Engineering, Tel Aviv, Israel
| | - Ehud Grossman
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- Internal Medicine Wing and Hypertension Unit, Sheba Medical Center, Tel Hashomer, Israel
| | - Zehava Ovadia-Blechman
- School of Medical Engineering, Afeka Tel Aviv Academic College of Engineering, Tel Aviv, Israel
| | - Jonathan Leor
- Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
- Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel.
| | - Oshrit Hoffer
- School of Electrical Engineering, Afeka Tel Aviv Academic College of Engineering, Tel Aviv, Israel
| |
Collapse
|
37
|
Weber Y, Epstein D, Miller A, Segal G, Berger G. Association of Low Alanine Aminotransferase Values with Extubation Failure in Adult Critically Ill Patients: A Retrospective Cohort Study. J Clin Med 2021; 10:jcm10153282. [PMID: 34362065 PMCID: PMC8348471 DOI: 10.3390/jcm10153282] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 07/21/2021] [Accepted: 07/23/2021] [Indexed: 11/16/2022] Open
Abstract
Background: Liberation from mechanical ventilation is a cardinal landmark during hospitalization of ventilated patients. Decreased muscle mass and sarcopenia are associated with a high risk of extubation failure. A low level of alanine aminotransferase (ALT) is a known biomarker of sarcopenia. This study aimed to determine whether low levels of ALT are associated with increased risk of extubation failure among critically ill patients. Methods: This was a retrospective single-center cohort study of mechanically ventilated patients undergoing their first extubation. The study’s outcome was extubation failure within 48 h and 7 days. Multivariable logistic and Cox regression were performed to determine whether ALT was an independent predictor of these outcomes. Results: The study included 329 patients with a median age of 62.4 years (IQR 48.1–71.2); 210 (63.8%) patients were at high risk for extubation failure. 66 (20.1%) and 83 (25.2%) failed the extubation attempt after 48 h and 7 days, respectively. Low ALT values were more common among patients requiring reintubation (80.3–61.5% vs. 58.6–58.9%, p < 0.002). Multivariable logistic regression analysis identified ALT as an independent predictor of extubation failure at 48 h and 7 days. ALT ≤ 21 IU/L had an adjusted hazard ratio (HR) of 2.41 (95% CI 1.31–4.42, p < 0.001) for extubation failure at 48 h and ALT ≤ 16 IU/L had adjusted HR of 1.94 (95% CI 1.25–3.02, p < 0.001) for failure after 7 days. Conclusions: Low ALT, an established biomarker of sarcopenia and frailty, is an independent risk factor for extubation failure among hospitalized patients. This simple laboratory parameter can be used as an effective adjunct predictor, along with other weaning parameters, and thereby facilitate the identification of high-risk patients.
Collapse
Affiliation(s)
- Yoav Weber
- Department of Internal Medicine “B”, Rambam Health Care Campus, Haifa 3109601, Israel; (D.E.); (G.B.)
- Correspondence: ; Tel.: +972-054-9249749
| | - Danny Epstein
- Department of Internal Medicine “B”, Rambam Health Care Campus, Haifa 3109601, Israel; (D.E.); (G.B.)
- Critical Care Division, Rambam Health Care Campus, Haifa 3109601, Israel
| | - Asaf Miller
- Medical Intensive Care Unit, Rambam Health Care Campus, Haifa 3109601, Israel;
| | - Gad Segal
- Department of Internal Medicine “T”, Chaim Sheba Medical Center, Tel-Hashomer, Ramat Gan 6971039, Israel;
- Sackler Faculty of Medicine, Tel-Aviv University, Ramat-Aviv 6997801, Israel
| | - Gidon Berger
- Department of Internal Medicine “B”, Rambam Health Care Campus, Haifa 3109601, Israel; (D.E.); (G.B.)
- Ruth and Bruce Rappaport Faculty of Medicine, Technion-Israel Institute of Technology, Haifa 3109601, Israel
| |
Collapse
|
38
|
Thomas JA, Foraker RE, Zamstein N, Payne PR, Wilcox AB. Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C). MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.07.06.21259051. [PMID: 34268525 PMCID: PMC8282114 DOI: 10.1101/2021.07.06.21259051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
OBJECTIVE To evaluate whether synthetic data derived from a national COVID-19 data set could be used for geospatial and temporal epidemic analyses. MATERIALS AND METHODS Using an original data set (n=1,854,968 SARS-CoV-2 tests) and its synthetic derivative, we compared key indicators of COVID-19 community spread through analysis of aggregate and zip-code level epidemic curves, patient characteristics and outcomes, distribution of tests by zip code, and indicator counts stratified by month and zip code. Similarity between the data was statistically and qualitatively evaluated. RESULTS In general, synthetic data closely matched original data for epidemic curves, patient characteristics, and outcomes. Synthetic data suppressed labels of zip codes with few total tests (mean=2.9±2.4; max=16 tests; 66% reduction of unique zip codes). Epidemic curves and monthly indicator counts were similar between synthetic and original data in a random sample of the most tested (top 1%; n=171) and for all unsuppressed zip codes (n=5,819), respectively. In small sample sizes, synthetic data utility was notably decreased. DISCUSSION Analyses on the population-level and of densely-tested zip codes (which contained most of the data) were similar between original and synthetically-derived data sets. Analyses of sparsely-tested populations were less similar and had more data suppression. CONCLUSION In general, synthetic data were successfully used to analyze geospatial and temporal trends. Analyses using small sample sizes or populations were limited, in part due to purposeful data label suppression -an attribute disclosure countermeasure. Users should consider data fitness for use in these cases.
Collapse
Affiliation(s)
- Jason A. Thomas
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA, USA
| | - Randi E. Foraker
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | | | - Philip R.O. Payne
- Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
- Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, MO, USA
| | - Adam B. Wilcox
- Department of Biomedical Informatics & Medical Education, University of Washington, Seattle, WA, USA
- UW Medicine, Seattle, WA, USA
| | | |
Collapse
|
39
|
Pereira T, Morgado J, Silva F, Pelter MM, Dias VR, Barros R, Freitas C, Negrão E, Flor de Lima B, Correia da Silva M, Madureira AJ, Ramos I, Hespanhol V, Costa JL, Cunha A, Oliveira HP. Sharing Biomedical Data: Strengthening AI Development in Healthcare. Healthcare (Basel) 2021; 9:healthcare9070827. [PMID: 34208830 PMCID: PMC8303863 DOI: 10.3390/healthcare9070827] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/11/2021] [Accepted: 06/22/2021] [Indexed: 01/17/2023] Open
Abstract
Artificial intelligence (AI)-based solutions have revolutionized our world, using extensive datasets and computational resources to create automatic tools for complex tasks that, until now, have been performed by humans. Massive data is a fundamental aspect of the most powerful AI-based algorithms. However, for AI-based healthcare solutions, there are several socioeconomic, technical/infrastructural, and most importantly, legal restrictions, which limit the large collection and access of biomedical data, especially medical imaging. To overcome this important limitation, several alternative solutions have been suggested, including transfer learning approaches, generation of artificial data, adoption of blockchain technology, and creation of an infrastructure composed of anonymous and abstract data. However, none of these strategies is currently able to completely solve this challenge. The need to build large datasets that can be used to develop healthcare solutions deserves special attention from the scientific community, clinicians, all the healthcare players, engineers, ethicists, legislators, and society in general. This paper offers an overview of the data limitation in medical predictive models; its impact on the development of healthcare solutions; benefits and barriers of sharing data; and finally, suggests future directions to overcome data limitations in the medical field and enable AI to enhance healthcare. This perspective is dedicated to the technical requirements of the learning models, and it explains the limitation that comes from poor and small datasets in the medical domain and the technical options that try or can solve the problem related to the lack of massive healthcare data.
Collapse
Affiliation(s)
- Tania Pereira
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
- Correspondence:
| | - Joana Morgado
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
- FCUP—Faculty of Science, University of Porto, 4169-007 Porto, Portugal
| | - Francisco Silva
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
| | - Michele M. Pelter
- Department of Physiological Nursing, School of Nursing, University of California, San Francisco, CA 94143, USA;
| | - Vasco Rosa Dias
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
| | - Rita Barros
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
| | - Cláudia Freitas
- CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
- FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
| | - Eduardo Negrão
- CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
| | - Beatriz Flor de Lima
- CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
| | - Miguel Correia da Silva
- CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
| | - António J. Madureira
- CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
- FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
| | - Isabel Ramos
- CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
- FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
| | - Venceslau Hespanhol
- CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
- FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
| | - José Luis Costa
- FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
- i3S—Institute for Research and Innovation in Health of the University of Porto, 4200-135 Porto, Portugal
- IPATIMUP—Institute of Molecular Pathology and Immunology of the University of Porto, 4200-135 Porto, Portugal
| | - António Cunha
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
- UTAD—University of Trás-os-Montes and Alto Douro, 5001-801 Vila Real, Portugal
| | - Hélder P. Oliveira
- INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
- FCUP—Faculty of Science, University of Porto, 4169-007 Porto, Portugal
| |
Collapse
|
40
|
Azizi Z, Zheng C, Mosquera L, Pilote L, El Emam K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open 2021; 11:e043497. [PMID: 33863713 PMCID: PMC8055130 DOI: 10.1136/bmjopen-2020-043497] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 01/14/2021] [Accepted: 03/18/2021] [Indexed: 11/03/2022] Open
Abstract
OBJECTIVES There are increasing requirements to make research data, especially clinical trial data, more broadly available for secondary analyses. However, data availability remains a challenge due to complex privacy requirements. This challenge can potentially be addressed using synthetic data. SETTING Replication of a published stage III colon cancer trial secondary analysis using synthetic data generated by a machine learning method. PARTICIPANTS There were 1543 patients in the control arm that were included in our analysis. PRIMARY AND SECONDARY OUTCOME MEASURES Analyses from a study published on the real dataset were replicated on synthetic data to investigate the relationship between bowel obstruction and event-free survival. Information theoretic metrics were used to compare the univariate distributions between real and synthetic data. Percentage CI overlap was used to assess the similarity in the size of the bivariate relationships, and similarly for the multivariate Cox models derived from the two datasets. RESULTS Analysis results were similar between the real and synthetic datasets. The univariate distributions were within 1% of difference on an information theoretic metric. All of the bivariate relationships had CI overlap on the tau statistic above 50%. The main conclusion from the published study, that lack of bowel obstruction has a strong impact on survival, was replicated directionally and the HR CI overlap between the real and synthetic data was 61% for overall survival (real data: HR 1.56, 95% CI 1.11 to 2.2; synthetic data: HR 2.03, 95% CI 1.44 to 2.87) and 86% for disease-free survival (real data: HR 1.51, 95% CI 1.18 to 1.95; synthetic data: HR 1.63, 95% CI 1.26 to 2.1). CONCLUSIONS The high concordance between the analytical results and conclusions from synthetic and real data suggests that synthetic data can be used as a reasonable proxy for real clinical trial datasets. TRIAL REGISTRATION NUMBER NCT00079274.
Collapse
Affiliation(s)
- Zahra Azizi
- Center for Outcomes Research and Evaluation, Faculty of Medicine, McGill University, Montreal, Québec, Canada
| | - Chaoyi Zheng
- Data Science, Replica Analytics Ltd, Ottawa, Ontario, Canada
| | - Lucy Mosquera
- Data Science, Replica Analytics Ltd, Ottawa, Ontario, Canada
| | - Louise Pilote
- Medicine, McGill University, Montreal, Québec, Canada
- Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, Montreal, Québec, Canada
| | - Khaled El Emam
- Electronic Health Information Laboratory, Children's Hospital of Eastern Ontario Research Institute, Ottawa, Ontario, Canada
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, Ontario, Canada
| |
Collapse
|
41
|
Kaur D, Sobiesk M, Patil S, Liu J, Bhagat P, Gupta A, Markuzon N. Application of Bayesian networks to generate synthetic health data. J Am Med Inform Assoc 2021; 28:801-811. [PMID: 33367620 DOI: 10.1093/jamia/ocaa303] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/16/2020] [Indexed: 01/08/2023] Open
Abstract
OBJECTIVE This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data. MATERIALS AND METHODS We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data. RESULTS Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules. DISCUSSION Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools. CONCLUSION We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.
Collapse
Affiliation(s)
- Dhamanpreet Kaur
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Matthew Sobiesk
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Shubham Patil
- Rochester Institute of Technology, Rochester, New York, USA
| | - Jin Liu
- Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA
| | - Puran Bhagat
- Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA
| | - Amar Gupta
- Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | - Natasha Markuzon
- Clinical Informatics, Philips Research North America, Cambridge, Massachusetts, USA
| |
Collapse
|
42
|
Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11052158] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Synthetic data provides a privacy protecting mechanism for the broad usage and sharing of healthcare data for secondary purposes. It is considered a safe approach for the sharing of sensitive data as it generates an artificial dataset that contains no identifiable information. Synthetic data is increasing in popularity with multiple synthetic data generators developed in the past decade, yet its utility is still a subject of research. This paper is concerned with evaluating the effect of various synthetic data generation and usage settings on the utility of the generated synthetic data and its derived models. Specifically, we investigate (i) the effect of data pre-processing on the utility of the synthetic data generated, (ii) whether tuning should be applied to the synthetic datasets when generating supervised machine learning models, and (iii) whether sharing preliminary machine learning results can improve the synthetic data models. Lastly, (iv) we investigate whether one utility measure (Propensity score) can predict the accuracy of the machine learning models generated from the synthetic data when employed in real life. We use two popular measures of synthetic data utility, propensity score and classification accuracy, to compare the different settings. We adopt a recent mechanism for the calculation of propensity, which looks carefully into the choice of model for the propensity score calculation. Accordingly, this paper takes a new direction with investigating the effect of various data generation and usage settings on the quality of the generated data and its ensuing models. The goal is to inform on the best strategies to follow when generating and using synthetic data.
Collapse
|
43
|
Vourganas I, Stankovic V, Stankovic L. Individualised Responsible Artificial Intelligence for Home-Based Rehabilitation. SENSORS (BASEL, SWITZERLAND) 2020; 21:E2. [PMID: 33374913 PMCID: PMC7792599 DOI: 10.3390/s21010002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/09/2020] [Accepted: 12/17/2020] [Indexed: 01/23/2023]
Abstract
Socioeconomic reasons post-COVID-19 demand unsupervised home-based rehabilitation and, specifically, artificial ambient intelligence with individualisation to support engagement and motivation. Artificial intelligence must also comply with accountability, responsibility, and transparency (ART) requirements for wider acceptability. This paper presents such a patient-centric individualised home-based rehabilitation support system. To this end, the Timed Up and Go (TUG) and Five Time Sit To Stand (FTSTS) tests evaluate daily living activity performance in the presence or development of comorbidities. We present a method for generating synthetic datasets complementing experimental observations and mitigating bias. We present an incremental hybrid machine learning algorithm combining ensemble learning and hybrid stacking using extreme gradient boosted decision trees and k-nearest neighbours to meet individualisation, interpretability, and ART design requirements while maintaining low computation footprint. The model reaches up to 100% accuracy for both FTSTS and TUG in predicting associated patient medical condition, and 100% or 83.13%, respectively, in predicting area of difficulty in the segments of the test. Our results show an improvement of 5% and 15% for FTSTS and TUG tests, respectively, over previous approaches that use intrusive means of monitoring such as cameras.
Collapse
Affiliation(s)
- Ioannis Vourganas
- Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow G1 1XW, UK; (V.S.); (L.S.)
| | | | | |
Collapse
|
44
|
Epstein D, Solomon N, Korytny A, Marcusohn E, Freund Y, Avrahami R, Neuberger A, Raz A, Miller A. Association between ionised calcium and severity of postpartum haemorrhage: a retrospective cohort study. Br J Anaesth 2020; 126:1022-1028. [PMID: 33341222 DOI: 10.1016/j.bja.2020.11.020] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 10/12/2020] [Accepted: 11/03/2020] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Postpartum haemorrhage (PPH) is often complicated by impaired coagulation. We aimed to determine whether the level of ionised calcium (Ca2+), an essential coagulation co-factor, at diagnosis of PPH is associated with bleeding severity. METHODS This was a retrospective cohort study of women diagnosed with PPH during vaginal delivery between January 2009 and April 2020. Ca2+ levels at PPH diagnosis were compared between women who progressed to severe PPH (primary outcome) and those with less severe bleeding. Severe PPH was defined by transfusion of ≥2 blood units, arterial embolisation or emergency surgery, admission to ICU, or death. Associations between other variables (e.g. fibrinogen concentration) and bleeding severity were also assessed. RESULTS For 436 patients included in the analysis, hypocalcaemia was more common among patients with severe PPH (51.5% vs 10.6%, P<0.001). In a multivariable logistic regression model, Ca2+ and fibrinogen were the only parameters independently associated with PPH severity with odds ratios of 1.14 for each 10 mg dl-1 decrease in fibrinogen (95% confidence interval [CI], 1.05-1.24; P=0.002) and 1.97 for each 0.1 mmol L-1 decrease in Ca2+ (95% CI, 1.25-3.1; P=0.003). The performance of Ca2+ or fibrinogen was not significantly different (area under the curve [AUC]=0.79 [95% CI, 0.75-0.83] vs AUC=0.86 [95% CI, 0.82-0.9]; P=0.09). The addition of Ca2+ to fibrinogen improved the model, leading to AUC of 0.9 (95% CI, 0.86-0.93), P=0.03. CONCLUSIONS Ca2+ level at the time of diagnosis of PPH was associated with risk of severe bleeding. Ca2+ monitoring may facilitate identification and treatment of high-risk patients.
Collapse
Affiliation(s)
- Danny Epstein
- Internal Medicine "B" Department, Rambam Health Care Campus, Haifa, Israel.
| | - Neta Solomon
- Department of Obstetrics and Gynecology, Lis Maternity Hospital, Sourasky Medical Center, Tel Aviv, Israel; Sackler School of Medicine, Tel Aviv University, Ramat-Aviv, Israel
| | - Alexander Korytny
- Department of Gastroenterology, Rambam Health Care Campus, Haifa, Israel; Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
| | - Erez Marcusohn
- Department of Cardiology, Rambam Health Care Campus, Haifa, Israel
| | - Yaacov Freund
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel
| | - Ron Avrahami
- Obstetrics and Gynecology Division, Rambam Health Care Campus, Haifa, Israel
| | - Ami Neuberger
- Internal Medicine "B" Department, Rambam Health Care Campus, Haifa, Israel; Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel; Infectious Diseases Unit, Rambam Health Care Campus, Haifa, Israel
| | - Aeyal Raz
- Ruth and Bruce Rappaport Faculty of Medicine, Technion, Haifa, Israel; Department of Anesthesiology, Rambam Health Care Campus, Haifa, Israel
| | - Asaf Miller
- Medical Intensive Care Unit, Rambam Health Care Campus, Haifa, Israel
| |
Collapse
|
45
|
Foraker RE, Yu SC, Gupta A, Michelson AP, Pineda Soto JA, Colvin R, Loh F, Kollef MH, Maddox T, Evanoff B, Dror H, Zamstein N, Lai AM, Payne PRO. Spot the difference: comparing results of analyses from real patient data and synthetic derivatives. JAMIA Open 2020; 3:557-566. [PMID: 33623891 PMCID: PMC7886551 DOI: 10.1093/jamiaopen/ooaa060] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/14/2020] [Accepted: 10/20/2020] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. OBJECTIVES To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. METHODS We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). RESULTS For each use case, the results of the analyses were sufficiently statistically similar (P > 0.05) between the synthetic derivative and the real data to draw the same conclusions. DISCUSSION AND CONCLUSION This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
Collapse
Affiliation(s)
- Randi E Foraker
- Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
- Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Sean C Yu
- Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Aditi Gupta
- Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Andrew P Michelson
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Jose A Pineda Soto
- Division of Critical Care Medicine, Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Los Angeles, Los Angeles, California, USA
| | - Ryan Colvin
- Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
- Division of Critical Care Medicine, Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Los Angeles, Los Angeles, California, USA
| | - Francis Loh
- School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Marin H Kollef
- Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Thomas Maddox
- Healthcare Innovation Lab, BJC Healthcare, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Bradley Evanoff
- Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | | | | | - Albert M Lai
- Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
- Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| | - Philip R O Payne
- Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
- Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
46
|
Jeon S, Seo J, Kim S, Lee J, Kim JH, Sohn JW, Moon J, Joo HJ. Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models. J Med Internet Res 2020; 22:e19597. [PMID: 33177037 PMCID: PMC7728527 DOI: 10.2196/19597] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/29/2020] [Accepted: 11/11/2020] [Indexed: 02/01/2023] Open
Abstract
Background De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy. Objective This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database. Methods The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models. Results The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification. Conclusions Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.
Collapse
Affiliation(s)
- Seungho Jeon
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Jeongeun Seo
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Sukyoung Kim
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Jeongmoon Lee
- Korea University Research Institute for Medical Bigdata Science, Korea University, Seoul, Republic of Korea
| | - Jong-Ho Kim
- Department of Cardiology, Cardiovascular Center, Korea University, Seoul, Republic of Korea
| | - Jang Wook Sohn
- Division of Infectious Diseases, Department of Internal Medicine, College of Medicine, Korea University, Seoul, Republic of Korea
| | - Jongsub Moon
- Division of Information Security, Graduate School of Information Security, Korea University, Seoul, Republic of Korea
| | - Hyung Joon Joo
- Department of Internal Medicine, Korea University College of Medicine, Korea University, Seoul, Republic of Korea
| |
Collapse
|
47
|
Gillies CE, Taylor DF, Cummings BC, Ansari S, Islim F, Kronick SL, Medlin RP, Ward KR. Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution. J Biomed Inform 2020; 110:103528. [PMID: 32795506 DOI: 10.1016/j.jbi.2020.103528] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/20/2020] [Accepted: 08/03/2020] [Indexed: 01/04/2023]
Abstract
When using tree-based methods to develop predictive analytics and early warning systems for preventive healthcare, it is important to use an appropriate imputation method to prevent learning the missingness pattern. To demonstrate this, we developed a novel simulation that generated synthetic electronic health record data using a variational autoencoder with a custom loss function, which took into account the high missing rate of electronic health data. We showed that when tree-based methods learn missingness patterns (correlated with adverse events) in electronic health record data, this leads to decreased performance if the system is used in a new setting that has different missingness patterns. Performance is worst in this scenario when the missing rate between those with and without an adverse event is the greatest. We found that randomized and Bayesian regression imputation methods mitigate the issue of learning the missingness pattern for tree-based methods. We used this information to build a novel early warning system for predicting patient deterioration in general wards and telemetry units: PICTURE (Predicting Intensive Care Transfers and other UnfoReseen Events). To develop, tune, and test PICTURE, we used labs and vital signs from electronic health records of adult patients over four years (n = 133,089 encounters). We analyzed primary outcomes of unplanned intensive care unit transfer, emergency vasoactive medication administration, cardiac arrest, and death. We compared PICTURE with existing early warning systems and logistic regression at multiple levels of granularity. When analyzing PICTURE on the testing set using all observations within a hospital encounter (event rate = 3.4%), PICTURE had an area under the receiver operating characteristic curve (AUROC) of 0.83 and an adjusted (event rate = 4%) area under the precision-recall curve (AUPR) of 0.27, while the next best tested method-regularized logistic regression-had an AUROC of 0.80 and an adjusted AUPR of 0.22. To ensure system interpretability, we applied a state-of-the-art prediction explainer that provided a ranked list of features contributing most to the prediction. Though it is currently difficult to compare machine learning-based early warning systems, a rudimentary comparison with published scores demonstrated that PICTURE is on par with state-of-the-art machine learning systems. To facilitate more robust comparisons and development of early warning systems in the future, we have released our variational autoencoder's code and weights so researchers can (a) test their models on data similar to our institution and (b) make their own synthetic datasets.
Collapse
Affiliation(s)
- Christopher E Gillies
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States; Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, United States.
| | - Daniel F Taylor
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Brandon C Cummings
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Sardar Ansari
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Fadi Islim
- School of Nursing, United States; Michigan Dialysis Services, Canton, MI, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Steven L Kronick
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Richard P Medlin
- Department of Emergency Medicine, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States
| | - Kevin R Ward
- Department of Emergency Medicine, United States; Department of Biomedical Engineering, United States; Michigan Center for Integrative Research in Critical Care (MCIRCC), United States; Michigan Institute for Data Science (MIDAS), University of Michigan, Ann Arbor, United States
| |
Collapse
|
48
|
Rankin D, Black M, Bond R, Wallace J, Mulvenna M, Epelde G. Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing. JMIR Med Inform 2020; 8:e18910. [PMID: 32501278 PMCID: PMC7400044 DOI: 10.2196/18910] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 04/24/2020] [Accepted: 06/04/2020] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. OBJECTIVE This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. METHODS A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. RESULTS A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. CONCLUSIONS The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.
Collapse
Affiliation(s)
- Debbie Rankin
- School of Computing, Engineering and Intelligent Systems, Ulster University, Derry~Londonderry, United Kingdom
| | - Michaela Black
- School of Computing, Engineering and Intelligent Systems, Ulster University, Derry~Londonderry, United Kingdom
| | - Raymond Bond
- School of Computing, Ulster University, Jordanstown, United Kingdom
| | - Jonathan Wallace
- School of Computing, Ulster University, Jordanstown, United Kingdom
| | - Maurice Mulvenna
- School of Computing, Ulster University, Jordanstown, United Kingdom
| | - Gorka Epelde
- Vicomtech Foundation, Basque Research and Technology Alliance, Donostia-San Sebastián, Spain
- Biodonostia Health Research Institute, eHealth Group, Donostia-San Sebastián, Spain
| |
Collapse
|