Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Reiner Benaim A, Almog R, Gorelik Y, Hochberg I, Nassar L, Mashiach T, Khamaisi M, Lurie Y, Azzam ZS, Khoury J, Kurnik D, Beyar R. Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies. JMIR Med Inform 2020;8:e16492. [PMID: 32130148 PMCID: PMC7059086 DOI: 10.2196/16492] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 12/01/2019] [Accepted: 12/27/2019] [Indexed: 12/16/2022] Open

For:	Reiner Benaim A, Almog R, Gorelik Y, Hochberg I, Nassar L, Mashiach T, Khamaisi M, Lurie Y, Azzam ZS, Khoury J, Kurnik D, Beyar R. Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies. JMIR Med Inform 2020;8:e16492. [PMID: 32130148 PMCID: PMC7059086 DOI: 10.2196/16492] [Citation(s) in RCA: 47] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 12/01/2019] [Accepted: 12/27/2019] [Indexed: 12/16/2022] Open

Number

Cited by Other Article(s)

Ramgopal S, Belanger T, Lorenz D, Lipsett SC, Neuman MI, Liebovitz D, Florin TA. Preferences for Management of Pediatric Pneumonia: A Clinician Survey of Artificially Generated Patient Cases. Pediatr Emerg Care 2024:00006565-990000000-00488. [PMID: 38950412 DOI: 10.1097/pec.0000000000003231] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 07/03/2024]

Abstract

BACKGROUND

It is unknown which factors are associated with chest radiograph (CXR) and antibiotic use for suspected community-acquired pneumonia (CAP) in children. We evaluated factors associated with CXR and antibiotic preferences among clinicians for children with suspected CAP using case scenarios generated through artificial intelligence (AI).

METHODS

We performed a survey of general pediatric, pediatric emergency medicine, and emergency medicine attending physicians employed by a private physician contractor. Respondents were given 5 unique, AI-generated case scenarios. We used generalized estimating equations to identify factors associated with CXR and antibiotic use. We evaluated the cluster-weighted correlation between clinician suspicion and clinical prediction model risk estimates for CAP using 2 predictive models.

RESULTS

A total of 172 respondents provided responses to 839 scenarios. Factors associated with CXR acquisition (OR, [95% CI]) included presence of crackles (4.17 [2.19, 7.95]), prior pneumonia (2.38 [1.32, 4.20]), chest pain (1.90 [1.18, 3.05]) and fever (1.82 [1.32, 2.52]). The decision to use antibiotics before knowledge of CXR results included past hospitalization for pneumonia (4.24 [1.88, 9.57]), focal decreased breath sounds (3.86 [1.98, 7.52]), and crackles (3.45 [2.15, 5.53]). After revealing CXR results to clinicians, these results were the sole predictor associated with antibiotic decision-making. Suspicion for CAP correlated with one of 2 prediction models for CAP (Spearman's rho = 0.25). Factors associated with a greater suspicion of pneumonia included prior pneumonia, duration of illness, worsening course of illness, shortness of breath, vomiting, decreased oral intake or urinary output, respiratory distress, head nodding, focal decreased breath sounds, focal rhonchi, fever, and crackles, and lower pulse oximetry.

CONCLUSIONS

Ordering preferences for CXRs demonstrated similarities and differences with evidence-based risk models for CAP. Clinicians relied heavily on CXR findings to guide antibiotic ordering. These findings can be used within decision support systems to promote evidence-based management practices for pediatric CAP.

Collapse

Brzezinski RY, Wasserman A, Sasson N, Stark M, Goldiner I, Rogowski O, Berliner S, Argov O. An Exploratory Analysis of Routine Ferritin Measurement Upon Admission and the Prognostic Implications of Low-Grade Ferritinemia During Inflammation. Am J Med 2024:S0002-9343(24)00277-8. [PMID: 38723929 DOI: 10.1016/j.amjmed.2024.04.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 04/16/2024] [Accepted: 04/17/2024] [Indexed: 06/10/2024]

El Emam K, Mosquera L, Fang X, El-Hussuna A. An evaluation of the replicability of analyses using synthetic health data. Sci Rep 2024;14:6978. [PMID: 38521806 PMCID: PMC10960851 DOI: 10.1038/s41598-024-57207-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Accepted: 03/15/2024] [Indexed: 03/25/2024] Open

Abstract

Synthetic data generation is being increasingly used as a privacy preserving approach for sharing health data. In addition to protecting privacy, it is important to ensure that generated data has high utility. A common way to assess utility is the ability of synthetic data to replicate results from the real data. Replicability has been defined using two criteria: (a) replicate the results of the analyses on real data, and (b) ensure valid population inferences from the synthetic data. A simulation study using three heterogeneous real-world datasets evaluated the replicability of logistic regression workloads. Eight replicability metrics were evaluated: decision agreement, estimate agreement, standardized difference, confidence interval overlap, bias, confidence interval coverage, statistical power, and precision (empirical SE). The analysis of synthetic data used a multiple imputation approach whereby up to 20 datasets were generated and the fitted logistic regression models were combined using combining rules for fully synthetic datasets. The effects of synthetic data amplification were evaluated, and two types of generative models were used: sequential synthesis using boosted decision trees and a generative adversarial network (GAN). Privacy risk was evaluated using a membership disclosure metric. For sequential synthesis, adjusted model parameters after combining at least ten synthetic datasets gave high decision and estimate agreement, low standardized difference, as well as high confidence interval overlap, low bias, the confidence interval had nominal coverage, and power close to the nominal level. Amplification had only a marginal benefit. Confidence interval coverage from a single synthetic dataset without applying combining rules were erroneous, and statistical power, as expected, was artificially inflated when amplification was used. Sequential synthesis performed considerably better than the GAN across multiple datasets. Membership disclosure risk was low for all datasets and models. For replicable results, the statistical analysis of fully synthetic data should be based on at least ten generated datasets of the same size as the original whose analyses results are combined. Analysis results from synthetic data without applying combining rules can be misleading. Replicability results are dependent on the type of generative model used, with our study suggesting that sequential synthesis has good replicability characteristics for common health research workloads.

Collapse

Lun R, Siegal D, Ramsay T, Stotts G, Dowlatshahi D. Synthetic data in cancer and cerebrovascular disease research: A novel approach to big data. PLoS One 2024;19:e0295921. [PMID: 38324588 PMCID: PMC10849264 DOI: 10.1371/journal.pone.0295921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 12/01/2023] [Indexed: 02/09/2024] Open

Abstract

OBJECTIVES

Synthetic datasets are artificially manufactured based on real health systems data but do not contain real patient information. We sought to validate the use of synthetic data in stroke and cancer research by conducting a comparison study of cancer patients with ischemic stroke to non-cancer patients with ischemic stroke.

DESIGN

retrospective cohort study.

SETTING

We used synthetic data generated by MDClone and compared it to its original source data (i.e. real patient data from the Ottawa Hospital Data Warehouse).

OUTCOME MEASURES

We compared key differences in demographics, treatment characteristics, length of stay, and costs between cancer patients with ischemic stroke and non-cancer patients with ischemic stroke. We used a binary, multivariable logistic regression model to identify risk factors for recurrent stroke in the cancer population.

RESULTS

Using synthetic data, we found cancer patients with ischemic stroke had a lower prevalence of hypertension (52.0% in the cancer cohort vs 57.7% in the non-cancer cohort, p<0.0001), and a higher prevalence of chronic obstructive pulmonary disease (COPD: 8.5% vs 4.7%, p<0.0001), prior ischemic stroke (1.7% vs 0.1%, p<0.001), and prior venous thromboembolism (VTE: 8.2% vs 1.5%, p<0.0001). They also had a longer length of stay (8 days [IQR 3-16] vs 6 days [IQR 3-13], p = 0.011), and higher costs associated with their stroke encounters: $11,498 (IQR $4,440 -$20,668) in the cancer cohort vs $8,084 (IQR $3,947 -$16,706) in the non-cancer cohort (p = 0.0061). A multivariable logistic regression model identified 5 predictors for recurrent ischemic stroke in the cancer cohort using synthetic data; 3 of the same predictors identified using real patient data with similar effect measures. Summary statistics between synthetic and original datasets did not significantly differ, other than slight differences in the distributions of frequencies for numeric data.

CONCLUSION

We demonstrated the utility of synthetic data in stroke and cancer research and provided key differences between cancer and non-cancer patients with ischemic stroke. Synthetic data is a powerful tool that can allow researchers to easily explore hypothesis generation, enable data sharing without privacy breaches, and ensure broad access to big data in a rapid, safe, and reliable fashion.

Collapse

Mok H, Ostendorf E, Ganninger A, Adler AJ, Hazan G, Haspel JA. Circadian immunity from bench to bedside: a practical guide. J Clin Invest 2024;134:e175706. [PMID: 38299593 PMCID: PMC10836804 DOI: 10.1172/jci175706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2024] Open

Prasanna A, Jing B, Plopper G, Miller KK, Sanjak J, Feng A, Prezek S, Vidyaprakash E, Thovarai V, Maier EJ, Bhattacharya A, Naaman L, Stephens H, Watford S, Boscardin WJ, Johanson E, Lienau A. Synthetic Health Data Can Augment Community Research Efforts to Better Inform the Public During Emerging Pandemics. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.12.11.23298687. [PMID: 38168217 PMCID: PMC10760275 DOI: 10.1101/2023.12.11.23298687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]

Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med 2023;6:186. [PMID: 37813960 PMCID: PMC10562365 DOI: 10.1038/s41746-023-00927-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 09/14/2023] [Indexed: 10/11/2023] Open

Ang CYS, Chiew YS, Wang X, Ooi EH, Nor MBM, Cove ME, Chase JG. Virtual patient with temporal evolution for mechanical ventilation trial studies: A stochastic model approach. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023;240:107728. [PMID: 37531693 DOI: 10.1016/j.cmpb.2023.107728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Revised: 06/27/2023] [Accepted: 07/19/2023] [Indexed: 08/04/2023]

Abstract

BACKGROUND AND OBJECTIVE

Healthcare datasets are plagued by issues of data scarcity and class imbalance. Clinically validated virtual patient (VP) models can provide accurate in-silico representations of real patients and thus a means for synthetic data generation in hospital critical care settings. This research presents a realistic, time-varying mechanically ventilated respiratory failure VP profile synthesised using a stochastic model.

METHODS

A stochastic model was developed using respiratory elastance (Ers) data from two clinical cohorts and averaged over 30-minute time intervals. The stochastic model was used to generate future Ers data based on current Ers values with added normally distributed random noise. Self-validation of the VPs was performed via Monte Carlo simulation and retrospective Ers profile fitting. A stochastic VP cohort of temporal Ers evolution was synthesised and then compared to an independent retrospective patient cohort data in a virtual trial across several measured patient responses, where similarity of profiles validates the realism of stochastic model generated VP profiles.

RESULTS

A total of 120,000 3-hour VPs for pressure control (PC) and volume control (VC) ventilation modes are generated using stochastic simulation. Optimisation of the stochastic simulation process yields an ideal noise percentage of 5-10% and simulation iteration of 200,000 iterations, allowing the simulation of a realistic and diverse set of Ers profiles. Results of self-validation show the retrospective Ers profiles were able to be recreated accurately with a mean squared error of only 0.099 [0.009-0.790]% for the PC cohort and 0.051 [0.030-0.126]% for the VC cohort. A virtual trial demonstrates the ability of the stochastic VP cohort to capture Ers trends within and beyond the retrospective patient cohort providing cohort-level validation.

CONCLUSION

VPs capable of temporal evolution demonstrate feasibility for use in designing, developing, and optimising bedside MV guidance protocols through in-silico simulation and validation. Overall, the temporal VPs developed using stochastic simulation alleviate the need for lengthy, resource intensive, high cost clinical trials, while facilitating statistically robust virtual trials, ultimately leading to improved patient care and outcomes in mechanical ventilation.

Collapse

Greenberg JK, Landman JM, Kelly MP, Pennicooke BH, Molina CA, Foraker RE, Ray WZ. Leveraging Artificial Intelligence and Synthetic Data Derivatives for Spine Surgery Research. Global Spine J 2023;13:2409-2421. [PMID: 35373623 PMCID: PMC10538345 DOI: 10.1177/21925682221085535] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Zuber S, Bechtiger L, Bodelet JS, Golin M, Heumann J, Kim JH, Klee M, Mur J, Noll J, Voll S, O’Keefe P, Steinhoff A, Zölitz U, Muniz-Terrera G, Shanahan L, Shanahan MJ, Hofer SM. An integrative approach for the analysis of risk and health across the life course: challenges, innovations, and opportunities for life course research. DISCOVER SOCIAL SCIENCE AND HEALTH 2023;3:14. [PMID: 37469576 PMCID: PMC10352429 DOI: 10.1007/s44155-023-00044-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Accepted: 06/26/2023] [Indexed: 07/21/2023]

Affiliation(s)

Sascha Zuber Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada Center for the Interdisciplinary Study of Gerontology and Vulnerability, University of Geneva, Geneva, Switzerland
Laura Bechtiger Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Julien Stéphane Bodelet Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Marta Golin Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Jens Heumann Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Jung Hyun Kim University of Luxembourg, Esch-sur-Alzette, Luxembourg
Matthias Klee University of Luxembourg, Esch-sur-Alzette, Luxembourg
Jure Mur University of Edinburgh, Edinburgh, Scotland
Jennie Noll Pennsylvania State University, State College, PA USA
Stacey Voll Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada
Patrick O’Keefe Department of Neurology, Oregon Health & Science University, Portland, OR USA
Annekatrin Steinhoff Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland University Hospital of Child and Adolescent Psychiatry and Psychotherapy, University of Bern, Bern, Switzerland
Ulf Zölitz Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland
Graciela Muniz-Terrera Ohio University, Athens, OH USA
Lilly Shanahan Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland Department of Psychology, University of Zürich, Zürich, Switzerland
Michael J. Shanahan Jacobs Center for Productive Youth Development, University of Zürich, Zürich, Switzerland Department of Sociology, University of Zürich, Zürich, Switzerland
Scott M. Hofer Institute On Aging & Lifelong Health, University of Victoria, Victoria, BC Canada Department of Neurology, Oregon Health & Science University, Portland, OR USA

Collapse

Azizi Z, Lindner S, Shiba Y, Raparelli V, Norris CM, Kublickiene K, Herrero MT, Kautzky-Willer A, Klimek P, Gisinger T, Pilote L, El Emam K. A comparison of synthetic data generation and federated analysis for enabling international evaluations of cardiovascular health. Sci Rep 2023;13:11540. [PMID: 37460705 DOI: 10.1038/s41598-023-38457-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2022] [Accepted: 07/08/2023] [Indexed: 07/20/2023] Open

Affiliation(s)

Zahra Azizi Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada
Simon Lindner Department of Internal Medicine III, Division of Endocrinology and Metabolism, Gender Medicine Unit, Medical University of Vienna, Vienna, Austria
Yumika Shiba Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada Faculty of Medicine, McGill University, Montreal, Canada
Valeria Raparelli Department of Translational Medicine, University of Ferrara, Ferrara, Italy Faculty of Nursing, University of Alberta, Edmonton, AB, Canada
Colleen M Norris Faculty of Nursing, University of Alberta, Edmonton, AB, Canada Heart and Stroke Strategic Clinical Networks, Alberta Health Services, Alberta, Canada
Karolina Kublickiene Karolinska Institute, Stockholm, Sweden
Maria Trinidad Herrero Clinical & Experimental Neuroscience (NiCE-IMIB-IUIE), School of Medicine, University of Murcia, Murcia, Spain
Alexandra Kautzky-Willer Department of Internal Medicine III, Division of Endocrinology and Metabolism, Gender Medicine Unit, Medical University of Vienna, Vienna, Austria
Peter Klimek Section for Science of Complex Systems, CeMSIIS, Medical University of Vienna, Vienna, Austria Complexity Science Hub Vienna, Vienna, Austria
Teresa Gisinger Division of Endocrinology and Metabolism, Medical University of Vienna, Vienna, Austria
Louise Pilote Centre for Outcomes Research and Evaluation, Research Institute of the McGill University Health Centre, 5252 De Maisonneuve Blvd, Office 2B.39, Montréal, QC, H4A 3S5, Canada. Divisions of Clinical Epidemiology and General Internal Medicine, McGill University Health Centre Research Institute, Montreal, QC, Canada.
Khaled El Emam Children's Hospital of Eastern Ontario Research Institute, 401 Smyth Road, Ottawa, ON, K1H 8L1, Canada. School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada. Replica Analytics Ltd, Ottawa, ON, Canada.

Collapse

Deniz-Garcia A, Fabelo H, Rodriguez-Almeida AJ, Zamora-Zamorano G, Castro-Fernandez M, Alberiche Ruano MDP, Solvoll T, Granja C, Schopf TR, Callico GM, Soguero-Ruiz C, Wägner AM. Quality, Usability, and Effectiveness of mHealth Apps and the Role of Artificial Intelligence: Current Scenario and Challenges. J Med Internet Res 2023;25:e44030. [PMID: 37140973 PMCID: PMC10196903 DOI: 10.2196/44030] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 02/19/2023] [Accepted: 03/10/2023] [Indexed: 03/12/2023] Open

Affiliation(s)

Alejandro Deniz-Garcia Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain
Himar Fabelo Complejo Hospitalario Universitario Insular - Materno Infantil, Fundación Canaria Instituto de Investigación Sanitaria de Canarias, Las Palmas de Gran Canaria, Spain Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Antonio J Rodriguez-Almeida Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Garlene Zamora-Zamorano Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Maria Castro-Fernandez Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Maria Del Pino Alberiche Ruano Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Terje Solvoll Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway Faculty of Nursing and Health Sciences, Nord University, Bodø, Norway
Conceição Granja Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway Faculty of Nursing and Health Sciences, Nord University, Bodø, Norway
Thomas Roger Schopf Norwegian Centre for E-health Research, University Hospital of North-Norway, Tromsø, Norway
Gustavo M Callico Research Institute for Applied Microelectronics, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Cristina Soguero-Ruiz Departamento de Teoría de la Señal y Comunicaciones y Sistemas Telemáticos y Computación, Universidad Rey Juan Carlos, Madrid, Spain
Ana M Wägner Endocrinology and Nutrition Department, Complejo Hospitalario Universitario Insular Materno Infantil, Las Palmas de Gran Canaria, Spain Instituto Universitario de Investigaciones Biomédicas y Sanitarias, Universidad de Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain

Collapse

Ganguli R, Lad R, Lin A, Yu X. Novel Generative Recurrent Neural Network Framework to Produce Accurate, Applicable, and Deidentified Synthetic Medical Data for Patients With Metastatic Cancer. JCO Clin Cancer Inform 2023;7:e2200125. [PMID: 37130342 DOI: 10.1200/cci.22.00125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/04/2023] Open

Abstract

PURPOSE

Sensitive patient data cannot be easily shared/analyzed, severely limiting the innovative progress of research, specifically for marginalized/under-represented populations. Existing methods of deidentification are subject to data breaches. The objective of this study was to develop a neural network capable of generating a synthetic version of data for patients with novel postoperative metastatic cancer.

METHODS

We analyzed a metastatic cancer patient cohort of 167,474 patients obtained from the National Surgical Quality Improvement Program. Twenty-seven clinical features were analyzed. We created a volume-matched synthetic cohort of 167,474 patients and a reduced-size synthetic cohort of 5,000 patients. The volume-matched and reduced-size synthetic cohorts were compared against the ground truth data to analyze differences in principal component distribution, underlying statistical properties/associations, intervariable correlations, and machine learning classifier performance when developed on the synthetic data.

RESULTS

Among 167,474 patients with metastatic cancer in the original data, 50,669 (30.3%) died within 30 days of their index surgery. Our model was able to accurately capture underlying statistical properties, principal components, and intervariable correlations within the ground truth data, yielding an accuracy of 93.2% with a loss of 0.21%, and develop synthetic data capable of training accurate machine learning classifiers. The reduced-size synthetic data accurately replicated all categorical variables and every continuous variable with statistically similar records (P > .05), with the sole exception of preoperative albumin (P < .05). The volume-matched synthetic data frame was able to accurately replicate all categorical variables (P > .05).

CONCLUSION

This described methodology can be applied to any structured medical data from any setting, significantly expedite scientific analysis/innovation, and be used to develop improved predictive classifiers with boosted tree-based algorithms, serving as the potential new gold standard of medical data sharing and data augmentation.

Collapse

Davis SE, Ssemaganda H, Koola JD, Mao J, Westerman D, Speroff T, Govindarajulu US, Ramsay CR, Sedrakyan A, Ohno-Machado L, Resnic FS, Matheny ME. Simulating complex patient populations with hierarchical learning effects to support methods development for post-market surveillance. BMC Med Res Methodol 2023;23:89. [PMID: 37041457 PMCID: PMC10088292 DOI: 10.1186/s12874-023-01913-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 04/04/2023] [Indexed: 04/13/2023] Open

Abstract

BACKGROUND

Validating new algorithms, such as methods to disentangle intrinsic treatment risk from risk associated with experiential learning of novel treatments, often requires knowing the ground truth for data characteristics under investigation. Since the ground truth is inaccessible in real world data, simulation studies using synthetic datasets that mimic complex clinical environments are essential. We describe and evaluate a generalizable framework for injecting hierarchical learning effects within a robust data generation process that incorporates the magnitude of intrinsic risk and accounts for known critical elements in clinical data relationships.

METHODS

We present a multi-step data generating process with customizable options and flexible modules to support a variety of simulation requirements. Synthetic patients with nonlinear and correlated features are assigned to provider and institution case series. The probability of treatment and outcome assignment are associated with patient features based on user definitions. Risk due to experiential learning by providers and/or institutions when novel treatments are introduced is injected at various speeds and magnitudes. To further reflect real-world complexity, users can request missing values and omitted variables. We illustrate an implementation of our method in a case study using MIMIC-III data for reference patient feature distributions.

RESULTS

Realized data characteristics in the simulated data reflected specified values. Apparent deviations in treatment effects and feature distributions, though not statistically significant, were most common in small datasets (n < 3000) and attributable to random noise and variability in estimating realized values in small samples. When learning effects were specified, synthetic datasets exhibited changes in the probability of an adverse outcomes as cases accrued for the treatment group impacted by learning and stable probabilities as cases accrued for the treatment group not affected by learning.

CONCLUSIONS

Our framework extends clinical data simulation techniques beyond generation of patient features to incorporate hierarchical learning effects. This enables the complex simulation studies required to develop and rigorously test algorithms developed to disentangle treatment safety signals from the effects of experiential learning. By supporting such efforts, this work can help identify training opportunities, avoid unwarranted restriction of access to medical advances, and hasten treatment improvements.

Collapse

Affiliation(s)

Sharon E Davis Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA.
Henry Ssemaganda Comparative Effectiveness Research Institute, Lahey Hospital and Medical Center, 41 Mall Road, Burlington, MA, 01803, USA
Jejo D Koola UC Health Department of Biomedical Informatics, University of California San Diego, 9500 Gilman Dr. MC 0728, La Jolla, San Diego, CA, 92093-0728, USA
Jialin Mao Department of Population Health Sciences, Weill Cornell Medicine, 1300 York Avenue, New York, NY, 10065, USA
Dax Westerman Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA
Theodore Speroff Departments of Medicine and Biostatistics, Vanderbilt University Medical Center, 1313 21St Avenue South, Oxford House, Room 209, Nashville, TN, 37232, USA
Usha S Govindarajulu Center for Biostatistics, Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, Box 1077, New York, NY, 10029, USA
Craig R Ramsay Health Services Research Unit, University of Aberdeen, Health Sciences Building, Foresterhill, 3rd Floor, Aberdeen, AB25 2ZD, UK
Art Sedrakyan Department of Population Health Sciences, Weill Cornell Medicine, 1300 York Avenue, New York, NY, 10065, USA
Lucila Ohno-Machado Biomedical Informatics and Data Science, Yale School of Medicine, 100 College Street, New Haven, CT, 06510, USA
Frederic S Resnic Division of Cardiovascular Medicine and Comparative Effectiveness Research Institute, Lahey Hospital and Medical Center, Tufts University School of Medicine, 41 Burlington Mall Road, Burlington, MA, 01805, USA
Michael E Matheny Departments of Biomedical Informatics, Biostatistics, and Medicine, Vanderbilt University Medical Center, 2525 West End Ave, Suite 1475, Nashville, TN, 37203, USA Geriatric Research Education and Clinical Care Center, Tennessee Valley Healthcare System VA, 1310 24th Avenue South, Nashville, TN, 37212, USA

Collapse

Mosquera L, El Emam K, Ding L, Sharma V, Zhang XH, Kababji SE, Carvalho C, Hamilton B, Palfrey D, Kong L, Jiang B, Eurich DT. A method for generating synthetic longitudinal health data. BMC Med Res Methodol 2023;23:67. [PMID: 36959532 PMCID: PMC10034254 DOI: 10.1186/s12874-023-01869-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 02/19/2023] [Indexed: 03/25/2023] Open

Abstract

Getting access to administrative health data for research purposes is a difficult and time-consuming process due to increasingly demanding privacy regulations. An alternative method for sharing administrative health data would be to share synthetic datasets where the records do not correspond to real individuals, but the patterns and relationships seen in the data are reproduced. This paper assesses the feasibility of generating synthetic administrative health data using a recurrent deep learning model. Our data comes from 120,000 individuals from Alberta Health's administrative health database. We assess how similar our synthetic data is to the real data using utility assessments that assess the structure and general patterns in the data as well as by recreating a specific analysis in the real data commonly applied to this type of administrative health data. We also assess the privacy risks associated with the use of this synthetic dataset. Generic utility assessments that used Hellinger distance to quantify the difference in distributions between real and synthetic datasets for event types (0.027), attributes (mean 0.0417), Markov transition matrices (order 1 mean absolute difference: 0.0896, sd: 0.159; order 2: mean Hellinger distance 0.2195, sd: 0.2724), the Hellinger distance between the joint distributions was 0.352, and the similarity of random cohorts generated from real and synthetic data had a mean Hellinger distance of 0.3 and mean Euclidean distance of 0.064, indicating small differences between the distributions in the real data and the synthetic data. By applying a realistic analysis to both real and synthetic datasets, Cox regression hazard ratios achieved a mean confidence interval overlap of 68% for adjusted hazard ratios among 5 key outcomes of interest, indicating synthetic data produces similar analytic results to real data. The privacy assessment concluded that the attribution disclosure risk associated with this synthetic dataset was substantially less than the typical 0.09 acceptable risk threshold. Based on these metrics our results show that our synthetic data is suitably similar to the real data and could be shared for research purposes thereby alleviating concerns associated with the sharing of real data in some circumstances.

Collapse

Synthetic data in health care: A narrative review. PLOS DIGITAL HEALTH 2023;2:e0000082. [PMID: 36812604 PMCID: PMC9931305 DOI: 10.1371/journal.pdig.0000082] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 12/06/2022] [Indexed: 01/09/2023]

Arora A, Arora A. Machine learning models trained on synthetic datasets of multiple sample sizes for the use of predicting blood pressure from clinical data in a national dataset. PLoS One 2023;18:e0283094. [PMID: 36928534 PMCID: PMC10019654 DOI: 10.1371/journal.pone.0283094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2023] [Accepted: 03/01/2023] [Indexed: 03/18/2023] Open

Abstract

INTRODUCTION

The potential for synthetic data to act as a replacement for real data in research has attracted attention in recent months due to the prospect of increasing access to data and overcoming data privacy concerns when sharing data. The field of generative artificial intelligence and synthetic data is still early in its development, with a research gap evidencing that synthetic data can adequately be used to train algorithms that can be used on real data. This study compares the performance of a series machine learning models trained on real data and synthetic data, based on the National Diet and Nutrition Survey (NDNS).

METHODS

Features identified to be potentially of relevance by directed acyclic graphs were isolated from the NDNS dataset and used to construct synthetic datasets and impute missing data. Recursive feature elimination identified only four variables needed to predict mean arterial blood pressure: age, sex, weight and height. Bayesian generalised linear regression, random forest and neural network models were constructed based on these four variables to predict blood pressure. Models were trained on the real data training set (n = 2408), a synthetic data training set (n = 2408) and larger synthetic data training set (n = 4816) and a combination of the real and synthetic data training set (n = 4816). The same test set (n = 424) was used for each model.

RESULTS

Synthetic datasets demonstrated a high degree of fidelity with the real dataset. There was no significant difference between the performance of models trained on real, synthetic or combined datasets. Mean average error across all models and all training data ranged from 8.12 To 8.33. This indicates that synthetic data was capable of training equally accurate machine learning models as real data.

DISCUSSION

Further research is needed on a variety of datasets to confirm the utility of synthetic data to replace the use of potentially identifiable patient data. There is also further urgent research needed into evidencing that synthetic data can truly protect patient privacy against adversarial attempts to re-identify real individuals from the synthetic dataset.

Collapse

Braddon AE, Robinson S, Alati R, Betts KS. Exploring the utility of synthetic data to extract more value from sensitive health data assets: A focused example in perinatal epidemiology. Paediatr Perinat Epidemiol 2022;37:292-300. [PMID: 36482827 DOI: 10.1111/ppe.12942] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 11/09/2022] [Accepted: 11/17/2022] [Indexed: 12/13/2022]

Baumfeld Andre E, Carrington N, Siami FS, Hiatt JC, McWilliams C, Hiller C, Surinach A, Zamorano A, Pashos CL, Schulz WL. The Current Landscape and Emerging Applications for Real-World Data in Diagnostics and Clinical Decision Support and its Impact on Regulatory Decision Making. Clin Pharmacol Ther 2022;112:1172-1182. [PMID: 35213741 PMCID: PMC9790425 DOI: 10.1002/cpt.2565] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Accepted: 02/03/2022] [Indexed: 01/31/2023]

El Emam K, Mosquera L, Fang X. Validating a membership disclosure metric for synthetic health data. JAMIA Open 2022;5:ooac083. [PMID: 36238080 PMCID: PMC9553223 DOI: 10.1093/jamiaopen/ooac083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 09/13/2022] [Accepted: 09/22/2022] [Indexed: 11/24/2022] Open

Shi J, Wang D, Tesei G, Norgeot B. Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments. Front Artif Intell 2022;5:918813. [PMID: 36187323 PMCID: PMC9515575 DOI: 10.3389/frai.2022.918813] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 08/15/2022] [Indexed: 12/03/2022] Open

Abstract

In the past decade, there has been exponentially growing interest in the use of observational data collected as a part of routine healthcare practice to determine the effect of a treatment with causal inference models. Validation of these models, however, has been a challenge because the ground truth is unknown: only one treatment-outcome pair for each person can be observed. There have been multiple efforts to fill this void using synthetic data where the ground truth can be generated. However, to date, these datasets have been severely limited in their utility either by being modeled after small non-representative patient populations, being dissimilar to real target populations, or only providing known effects for two cohorts (treated vs. control). In this work, we produced a large-scale and realistic synthetic dataset that provides ground truth effects for over 10 hypertension treatments on blood pressure outcomes. The synthetic dataset was created by modeling a nationwide cohort of more than 580, 000 hypertension patient data including each person's multi-year history of diagnoses, medications, and laboratory values. We designed a data generation process by combining an adapted ADS-GAN model for fictitious patient information generation and a neural network for treatment outcome generation. Wasserstein distance of 0.35 demonstrates that our synthetic data follows a nearly identical joint distribution to the patient cohort used to generate the data. Patient privacy was a primary concern for this study; the ϵ-identifiability metric, which estimates the probability of actual patients being identified, is 0.008%, ensuring that our synthetic data cannot be used to identify any actual patients. To demonstrate its usage, we tested the bias in causal effect estimation of four well-established models using this dataset. The approach we used can be readily extended to other types of diseases in the clinical domain, and to datasets in other domains as well.

Collapse

La Salvia M, Torti E, Leon R, Fabelo H, Ortega S, Martinez-Vega B, Callico GM, Leporati F. Deep Convolutional Generative Adversarial Networks to Enhance Artificial Intelligence in Healthcare: A Skin Cancer Application. SENSORS (BASEL, SWITZERLAND) 2022;22:6145. [PMID: 36015906 PMCID: PMC9416026 DOI: 10.3390/s22166145] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/04/2022] [Accepted: 08/14/2022] [Indexed: 06/15/2023]

Door to balloon time in primary percutaneous coronary intervention in ST elevation myocardial infarction: every minute counts. Coron Artery Dis 2022;33:341-348. [PMID: 35880558 DOI: 10.1097/mca.0000000000001145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Thomas JA, Foraker RE, Zamstein N, Morrow JD, Payne PRO, Wilcox AB. Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C). J Am Med Inform Assoc 2022;29:1350-1365. [PMID: 35357487 PMCID: PMC8992357 DOI: 10.1093/jamia/ocac045] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 03/11/2022] [Accepted: 03/28/2022] [Indexed: 11/16/2022] Open

The Association Between Opioid Use and Opioid Type and the Clinical Course and Outcomes of Acute Pancreatitis. Pancreas 2022;51:523-530. [PMID: 35835104 DOI: 10.1097/mpa.0000000000002052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]

Brzezinski RY, Melloul A, Berliner S, Goldiner I, Stark M, Rogowski O, Banai S, Shenhar-Tsarfaty S, Shacham Y. Early Detection of Inflammation-Prone STEMI Patients Using the CRP Troponin Test (CTT). J Clin Med 2022;11:jcm11092453. [PMID: 35566579 PMCID: PMC9105044 DOI: 10.3390/jcm11092453] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/21/2022] [Accepted: 04/25/2022] [Indexed: 02/01/2023] Open

Abstract

Elevated concentrations of C-reactive protein (CRP) early during an acute coronary syndrome (ACS) may reflect the magnitude of the inflammatory response to myocardial damage and are associated with worse outcome. However, the routine measurement of both CRP and cardiac troponin simultaneously in the setting of ST-segment myocardial infarction (STEMI) is not used broadly. Here, we sought to identify and characterize individuals who are prone to an elevated inflammatory response following STEMI by using a combined CRP and troponin test (CTT) and determine their short- and long-term outcome. We retrospectively examined 1186 patients with the diagnosis of acute STEMI, who had at least two successive measurements of combined CRP and cardiac troponin (up to 6 h apart), all within the first 48 h of admission. We used Chi-Square Automatic Interaction Detector (CHAID) tree analysis to determine which parameters, timing (baseline vs. serial measurements), and cut-offs should be used to predict mortality. Patients with high CRP concentrations (above 90th percentile, >33 mg/L) had higher 30 day and all-cause mortality rates compared to the rest of the cohort, regardless of their troponin test status (above or below 118,000 ng/L); 14.4% vs. 2.7%, p < 0.01. Furthermore, patients with both high CRP and high troponin levels on their second measurement had the highest 30-day mortality rates compared to the rest of the cohort; 21.4% vs. 3.7%, p < 0.01. These patients also had the highest all-cause mortality rates after a median follow-up of 4.5 years compared to the rest of the cohort; 42.9% vs. 12.7%, p < 0.01. In conclusion, serial measurements of both CRP and cardiac troponin might detect patients at increased risk for short-and long-term mortality following STEMI. We suggest the future use of the combined CTT as a potential early marker for inflammatory-prone patients with worse outcomes following ACS. This sub-type of patients might benefit from early anti-inflammatory therapy such as colchicine and anti-interleukin-1ß agents.

Collapse

Affiliation(s)

Rafael Y. Brzezinski Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
Ariel Melloul Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
Shlomo Berliner Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
Ilana Goldiner Department of Clinical Laboratories, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (I.G.); (M.S.)
Moshe Stark Department of Clinical Laboratories, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (I.G.); (M.S.)
Ori Rogowski Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
Shmuel Banai Department of Cardiology, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel;
Shani Shenhar-Tsarfaty Internal Medicine “C”, “D”, and “E”, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; (R.Y.B.); (A.M.); (S.B.); (O.R.); (S.S.-T.)
Yacov Shacham Department of Cardiology, Tel Aviv Medical Center, Affiliated with the Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; Correspondence:

Collapse

The “Coherent Data Set”: Combining Patient Data and Imaging in a Comprehensive, Synthetic Health Record. ELECTRONICS 2022. [DOI: 10.3390/electronics11081199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]

El Emam K, Mosquera L, Fang X, El-Hussuna A. Utility Metrics for Evaluating Synthetic Health Data Generation Methods: Validation Study. JMIR Med Inform 2022;10:e35734. [PMID: 35389366 PMCID: PMC9030990 DOI: 10.2196/35734] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 01/27/2022] [Accepted: 02/13/2022] [Indexed: 01/06/2023] Open

Bahouth F, Elias A, Ghersin I, Khoury E, Bar O, Sholy H, Khoury J, Azzam ZS. The prognostic value of heart rate at discharge in acute decompensation of heart failure with reduced ejection fraction. ESC Heart Fail 2022;9:585-594. [PMID: 34821080 PMCID: PMC8788061 DOI: 10.1002/ehf2.13710] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 09/30/2021] [Accepted: 10/31/2021] [Indexed: 11/26/2022] Open

Abstract

AIMS

The effect of elevated heart rate (HR) on morbidity and mortality is evident in chronic stable heart failure; data in this regard in acute decompensated heart failure (ADHF) setting are scarce. In this single-centre study, we sought to address the prognostic value of HR and beta-blocker dosage at discharge on all-cause mortality among patients with heart failure and reduced ejection fraction and ADHF.

METHODS AND RESULTS

In this retrospective observational study, 2945 patients were admitted for the first time with the primary diagnosis of ADHF between January 2008 and February 2018. Patients were divided by resting HR at discharge into three groups (HR < 70 b.p.m., HR 70-90 b.p.m., and HR > 90 b.p.m.). Evidence-based beta-blockers were defined as metoprolol, bisoprolol, and carvedilol. The doses of prescribed beta-blockers were calculated into a percentage target dose of each beta-blocker and divided to four quartiles: 0 < Dose ≤ 25%, 25% < Dose ≤ 50%, 50% < Dose ≤ 75%, and >75% of the target dose. Cox regression was used to calculate the hazard ratio for various HR categories and adjusting for clinical and laboratory variables. At discharge, 1226 patients had an HR < 70 b.p.m., 1347 patients had an HR at range 70-90 b.p.m., and 372 patients with an HR > 90 b.p.m. The 30 day mortality rate was 2.2%, 3.7%, and 12.1% (P < 0.001), respectively. Concordantly, 1 year mortality rate was 14.6%, 16.7%, and 30.4% (P < 0.001) among patients with HR < 70 b.p.m., HR 70-90 b.p.m., and HR > 90 b.p.m., respectively. The adjusted hazard ratio was significantly increased only in HR above 90 b.p.m. category (hazard ratio, 2.318; 95% confidence interval, 1.794-2.996).

CONCLUSIONS

Patients with ADHF and an HR of <90 b.p.m. at discharge had significantly a lower 1 year mortality independent of the dosage of beta-blocker at discharge. It is conceivable to discharge these patients with lower HR.

Collapse

Borreda I, Zukermann R, Epstein D, Marcusohn E. IV Sodium Ferric Gluconate Complex in Patients Hospitalized Due to Acute Decompensated Heart Failure and Iron Deficiency. J Cardiovasc Pharmacol Ther 2022;27:10742484211055639. [PMID: 34994220 DOI: 10.1177/10742484211055639] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Nakhleh A, Saiegh L, Shehadeh N, Weintrob N, Sheikh-Ahmad M, Supino-Rosin L, Alboim S, Gendelman R, Zloczower M. Screening for non-classic congenital adrenal hyperplasia in women: New insights using different immunoassays. Front Endocrinol (Lausanne) 2022;13:1048663. [PMID: 36704043 PMCID: PMC9871807 DOI: 10.3389/fendo.2022.1048663] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Accepted: 12/19/2022] [Indexed: 01/11/2023] Open

Gorelik Y, Bloch-Isenberg N, Hashoul S, Heyman SN, Khamaisi M. Hyperglycemia on Admission Predicts Acute Kidney Failure and Renal Functional Recovery among Inpatients. J Clin Med 2021;11:jcm11010054. [PMID: 35011805 PMCID: PMC8745405 DOI: 10.3390/jcm11010054] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Revised: 12/18/2021] [Accepted: 12/19/2021] [Indexed: 02/07/2023] Open

Marcusohn E, Gibory I, Miller A, Lipsky AM, Neuberger A, Epstein D. The association between the degree of fever as measured in the emergency department and clinical outcomes of hospitalized adult patients. Am J Emerg Med 2021;52:92-98. [PMID: 34894473 DOI: 10.1016/j.ajem.2021.11.045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/21/2021] [Accepted: 11/29/2021] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

Fever is a physiologic response to a wide range of pathologies and one of the most common complaints and clinical signs in the emergency medicine department (ED). The association between fever magnitude and clinical outcomes has been evaluated in specific populations with inconsistent results.

OBJECTIVES

In this study we aimed to investigate the association between the degree of fever in the ED and clinical outcomes of hospitalized febrile adult patients.

METHODS

This was a retrospective single-center cohort study of all the patients with maximal body temperature (BT) ≥ 38.0 °C, as recorded during the ED evaluation, who were hospitalized between January 2015 and December 2020. Patients with heatstroke were excluded. The primary outcome was 30-day all-cause mortality and secondary outcomes were intensive care unit (ICU) admission and development of acute kidney injury (AKI).

RESULTS

Fever was recorded among 8.1% of patients evaluated in the ED. Elevated BT was associated with increased risk of hospital admission (70.3% vs. 49.4%, p < 0.001), 30-day mortality (12.3% vs. 2.6%, p < 0.001), ICU admission (5.7% vs. 2.8%, p < 0.001), and AKI 11.7% vs. 3.8%, p < 0.001). After exclusion of nine patients with heatstroke, 21,252 hospitalized febrile patients were included in the final analysis. BT > 39.7 °C was progressively associated with increased mortality (OR 1.64-2.22, 95% CI 1.16-2.81, p < 0.005) as compared to BT 38.0-38.1 °C. More AKI events were observed in patients with BT > 39.5 °C (OR 1.48-2.91, 95% CI 1.11-3.66, p < 0.007). Temperature between 39.2 and 39.5 °C was associated with lower mortality (OR 0.62-0.71, 95% CI 0.51-0.87, p < 0.001). In a multiple logistic regression analysis BT > 39.9 °C was independently associated with increased mortality and AKI. BT > 39.7 °C was progressively associated with an increased risk of ICU admission.

CONCLUSION

Among febrile patients admitted to the hospital, BT > 39.5 °C was associated with adverse clinical course, as compared to patients with lower-grade fever (38.0-38.1 °C). These patients should be flagged on arrival to the ED and likely warrant more aggressive evaluation and treatment.

Collapse

Foraker R, Guo A, Thomas J, Zamstein N, Payne PR, Wilcox A. The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data. J Med Internet Res 2021;23:e30697. [PMID: 34559671 PMCID: PMC8491642 DOI: 10.2196/30697] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 08/24/2021] [Accepted: 09/12/2021] [Indexed: 01/22/2023] Open

Abstract

BACKGROUND

Computationally derived ("synthetic") data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic.

OBJECTIVE

We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes.

METHODS

We used the National COVID Cohort Collaborative's instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19-positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19-related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data.

RESULTS

For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts.

CONCLUSIONS

This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights.

Collapse

Korytny A, Klein A, Marcusohn E, Freund Y, Neuberger A, Raz A, Miller A, Epstein D. Hypocalcemia is associated with adverse clinical course in patients with upper gastrointestinal bleeding. Intern Emerg Med 2021;16:1813-1822. [PMID: 33651325 DOI: 10.1007/s11739-021-02671-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 02/11/2021] [Indexed: 12/26/2022]

Brzezinski RY, Rabin N, Lewis N, Peled R, Kerpel A, Tsur AM, Gendelman O, Naftali-Shani N, Gringauz I, Amital H, Leibowitz A, Mayan H, Ben-Zvi I, Heller E, Shechtman L, Rogowski O, Shenhar-Tsarfaty S, Konen E, Marom EM, Ironi A, Rahav G, Zimmer Y, Grossman E, Ovadia-Blechman Z, Leor J, Hoffer O. Automated processing of thermal imaging to detect COVID-19. Sci Rep 2021;11:17489. [PMID: 34471180 PMCID: PMC8410809 DOI: 10.1038/s41598-021-96900-9] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 08/17/2021] [Indexed: 01/08/2023] Open

Affiliation(s)

Rafael Y Brzezinski Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
Neta Rabin Faculty of Engineering, Tel-Aviv University, Tel Aviv, Israel
Nir Lewis Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
Racheli Peled Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
Ariel Kerpel Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Israel Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Avishai M Tsur Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel Israel Defense Forces, Medical Corps, Ramat Gan, Israel
Omer Gendelman Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
Nili Naftali-Shani Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel
Irina Gringauz Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Geriatrics Division, Sheba Medical Center, Tel Hashomer, Israel
Howard Amital Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
Avshalom Leibowitz Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
Haim Mayan Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
Ilan Ben-Zvi Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
Eyal Heller Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
Liran Shechtman Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine B, D, E, and F, Sheba Medical Center, Tel Hashomer, Israel
Ori Rogowski Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine C, D, and E, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
Shani Shenhar-Tsarfaty Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine C, D, and E, Tel Aviv Sourasky Medical Center, Tel Aviv, Israel
Eli Konen Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Israel Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Edith M Marom Department of Diagnostic Imaging, Sheba Medical Center, Tel Hashomer, Israel Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
Avinoah Ironi Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Department of Emergency Medicine, Sheba Medical Center, Tel Hashomer, Israel
Galia Rahav Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Infectious Disease Unit, Sheba Medical Center, Tel Hashomer, Israel
Yair Zimmer School of Medical Engineering, Afeka Tel Aviv Academic College of Engineering, Tel Aviv, Israel
Ehud Grossman Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel Internal Medicine Wing and Hypertension Unit, Sheba Medical Center, Tel Hashomer, Israel
Zehava Ovadia-Blechman School of Medical Engineering, Afeka Tel Aviv Academic College of Engineering, Tel Aviv, Israel
Jonathan Leor Neufeld Cardiac Research Institute, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel. Tamman Cardiovascular Research Institute, Leviev Heart Center, Sheba Medical Center, 52621, Tel Hashomer, Israel.
Oshrit Hoffer School of Electrical Engineering, Afeka Tel Aviv Academic College of Engineering, Tel Aviv, Israel

Collapse

Weber Y, Epstein D, Miller A, Segal G, Berger G. Association of Low Alanine Aminotransferase Values with Extubation Failure in Adult Critically Ill Patients: A Retrospective Cohort Study. J Clin Med 2021;10:jcm10153282. [PMID: 34362065 PMCID: PMC8348471 DOI: 10.3390/jcm10153282] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Revised: 07/21/2021] [Accepted: 07/23/2021] [Indexed: 11/16/2022] Open

Thomas JA, Foraker RE, Zamstein N, Payne PR, Wilcox AB. Demonstrating an approach for evaluating synthetic geospatial and temporal epidemiologic data utility: Results from analyzing >1.8 million SARS-CoV-2 tests in the United States National COVID Cohort Collaborative (N3C). MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2021:2021.07.06.21259051. [PMID: 34268525 PMCID: PMC8282114 DOI: 10.1101/2021.07.06.21259051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Pereira T, Morgado J, Silva F, Pelter MM, Dias VR, Barros R, Freitas C, Negrão E, Flor de Lima B, Correia da Silva M, Madureira AJ, Ramos I, Hespanhol V, Costa JL, Cunha A, Oliveira HP. Sharing Biomedical Data: Strengthening AI Development in Healthcare. Healthcare (Basel) 2021;9:healthcare9070827. [PMID: 34208830 PMCID: PMC8303863 DOI: 10.3390/healthcare9070827] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/11/2021] [Accepted: 06/22/2021] [Indexed: 01/17/2023] Open

Affiliation(s)

Tania Pereira INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.) Correspondence:
Joana Morgado INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.) FCUP—Faculty of Science, University of Porto, 4169-007 Porto, Portugal
Francisco Silva INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
Michele M. Pelter Department of Physiological Nursing, School of Nursing, University of California, San Francisco, CA 94143, USA;
Vasco Rosa Dias INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
Rita Barros INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.)
Cláudia Freitas CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.) FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
Eduardo Negrão CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
Beatriz Flor de Lima CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
Miguel Correia da Silva CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.)
António J. Madureira CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.) FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
Isabel Ramos CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.) FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
Venceslau Hespanhol CHUSJ—Centro Hospitalar e Universitário de São João, 4200-319 Porto, Portugal; (C.F.); (E.N.); (B.F.d.L.); (M.C.d.S.); (A.J.M.); (I.R.); (V.H.) FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal;
José Luis Costa FMUP—Faculty of Medicine, University of Porto, 4200-319 Porto, Portugal; i3S—Institute for Research and Innovation in Health of the University of Porto, 4200-135 Porto, Portugal IPATIMUP—Institute of Molecular Pathology and Immunology of the University of Porto, 4200-135 Porto, Portugal
António Cunha INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.) UTAD—University of Trás-os-Montes and Alto Douro, 5001-801 Vila Real, Portugal
Hélder P. Oliveira INESC TEC—Institute for Systems and Computer Engineering, Technology and Science, 4200-465 Porto, Portugal; (J.M.); (F.S.); (V.R.D.); (R.B.); (A.C.); (H.P.O.) FCUP—Faculty of Science, University of Porto, 4169-007 Porto, Portugal

Collapse

Azizi Z, Zheng C, Mosquera L, Pilote L, El Emam K. Can synthetic data be a proxy for real clinical trial data? A validation study. BMJ Open 2021;11:e043497. [PMID: 33863713 PMCID: PMC8055130 DOI: 10.1136/bmjopen-2020-043497] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/06/2020] [Revised: 01/14/2021] [Accepted: 03/18/2021] [Indexed: 11/03/2022] Open

Abstract

OBJECTIVES

There are increasing requirements to make research data, especially clinical trial data, more broadly available for secondary analyses. However, data availability remains a challenge due to complex privacy requirements. This challenge can potentially be addressed using synthetic data.

SETTING

Replication of a published stage III colon cancer trial secondary analysis using synthetic data generated by a machine learning method.

PARTICIPANTS

There were 1543 patients in the control arm that were included in our analysis.

PRIMARY AND SECONDARY OUTCOME MEASURES

Analyses from a study published on the real dataset were replicated on synthetic data to investigate the relationship between bowel obstruction and event-free survival. Information theoretic metrics were used to compare the univariate distributions between real and synthetic data. Percentage CI overlap was used to assess the similarity in the size of the bivariate relationships, and similarly for the multivariate Cox models derived from the two datasets.

RESULTS

Analysis results were similar between the real and synthetic datasets. The univariate distributions were within 1% of difference on an information theoretic metric. All of the bivariate relationships had CI overlap on the tau statistic above 50%. The main conclusion from the published study, that lack of bowel obstruction has a strong impact on survival, was replicated directionally and the HR CI overlap between the real and synthetic data was 61% for overall survival (real data: HR 1.56, 95% CI 1.11 to 2.2; synthetic data: HR 2.03, 95% CI 1.44 to 2.87) and 86% for disease-free survival (real data: HR 1.51, 95% CI 1.18 to 1.95; synthetic data: HR 1.63, 95% CI 1.26 to 2.1).

CONCLUSIONS

The high concordance between the analytical results and conclusions from synthetic and real data suggests that synthetic data can be used as a reasonable proxy for real clinical trial datasets.

TRIAL REGISTRATION NUMBER

NCT00079274.

Collapse

Kaur D, Sobiesk M, Patil S, Liu J, Bhagat P, Gupta A, Markuzon N. Application of Bayesian networks to generate synthetic health data. J Am Med Inform Assoc 2021;28:801-811. [PMID: 33367620 DOI: 10.1093/jamia/ocaa303] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 11/16/2020] [Indexed: 01/08/2023] Open

Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11052158] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Vourganas I, Stankovic V, Stankovic L. Individualised Responsible Artificial Intelligence for Home-Based Rehabilitation. SENSORS (BASEL, SWITZERLAND) 2020;21:E2. [PMID: 33374913 PMCID: PMC7792599 DOI: 10.3390/s21010002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/09/2020] [Accepted: 12/17/2020] [Indexed: 01/23/2023]

Epstein D, Solomon N, Korytny A, Marcusohn E, Freund Y, Avrahami R, Neuberger A, Raz A, Miller A. Association between ionised calcium and severity of postpartum haemorrhage: a retrospective cohort study. Br J Anaesth 2020;126:1022-1028. [PMID: 33341222 DOI: 10.1016/j.bja.2020.11.020] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 10/12/2020] [Accepted: 11/03/2020] [Indexed: 01/07/2023] Open

Foraker RE, Yu SC, Gupta A, Michelson AP, Pineda Soto JA, Colvin R, Loh F, Kollef MH, Maddox T, Evanoff B, Dror H, Zamstein N, Lai AM, Payne PRO. Spot the difference: comparing results of analyses from real patient data and synthetic derivatives. JAMIA Open 2020;3:557-566. [PMID: 33623891 PMCID: PMC7886551 DOI: 10.1093/jamiaopen/ooaa060] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 10/14/2020] [Accepted: 10/20/2020] [Indexed: 12/19/2022] Open

Affiliation(s)

Randi E Foraker Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Sean C Yu Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Aditi Gupta Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Andrew P Michelson Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Jose A Pineda Soto Division of Critical Care Medicine, Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Los Angeles, Los Angeles, California, USA
Ryan Colvin Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA Division of Critical Care Medicine, Department of Anesthesiology and Critical Care Medicine, Children's Hospital of Los Angeles, Los Angeles, California, USA
Francis Loh School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Marin H Kollef Division of Pulmonary and Critical Care Medicine, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Thomas Maddox Healthcare Innovation Lab, BJC Healthcare, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Bradley Evanoff Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Hovav Dror MDClone Ltd, Beer Sheva, Israel
Noa Zamstein MDClone Ltd, Beer Sheva, Israel
Albert M Lai Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA
Philip R O Payne Division of General Medical Sciences, Department of Medicine, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA Department of Medicine, Institute for Informatics, School of Medicine, Washington University in St. Louis, St. Louis, Missouri, USA

Collapse

Jeon S, Seo J, Kim S, Lee J, Kim JH, Sohn JW, Moon J, Joo HJ. Proposal and Assessment of a De-Identification Strategy to Enhance Anonymity of the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM) in a Public Cloud-Computing Environment: Anonymization of Medical Data Using Privacy Models. J Med Internet Res 2020;22:e19597. [PMID: 33177037 PMCID: PMC7728527 DOI: 10.2196/19597] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/29/2020] [Accepted: 11/11/2020] [Indexed: 02/01/2023] Open

Abstract

Background

De-identifying personal information is critical when using personal health data for secondary research. The Observational Medical Outcomes Partnership Common Data Model (CDM), defined by the nonprofit organization Observational Health Data Sciences and Informatics, has been gaining attention for its use in the analysis of patient-level clinical data obtained from various medical institutions. When analyzing such data in a public environment such as a cloud-computing system, an appropriate de-identification strategy is required to protect patient privacy.

Objective

This study proposes and evaluates a de-identification strategy that is comprised of several rules along with privacy models such as k-anonymity, l-diversity, and t-closeness. The proposed strategy was evaluated using the actual CDM database.

Methods

The CDM database used in this study was constructed by the Anam Hospital of Korea University. Analysis and evaluation were performed using the ARX anonymizing framework in combination with the k-anonymity, l-diversity, and t-closeness privacy models.

Results

The CDM database, which was constructed according to the rules established by Observational Health Data Sciences and Informatics, exhibited a low risk of re-identification: The highest re-identifiable record rate (11.3%) in the dataset was exhibited by the DRUG_EXPOSURE table, with a re-identification success rate of 0.03%. However, because all tables include at least one “highest risk” value of 100%, suitable anonymizing techniques are required; moreover, the CDM database preserves the “source values” (raw data), a combination of which could increase the risk of re-identification. Therefore, this study proposes an enhanced strategy to de-identify the source values to significantly reduce not only the highest risk in the k-anonymity, l-diversity, and t-closeness privacy models but also the overall possibility of re-identification.

Conclusions

Our proposed de-identification strategy effectively enhanced the privacy of the CDM database, thereby encouraging clinical research involving multiple centers.

Collapse

Gillies CE, Taylor DF, Cummings BC, Ansari S, Islim F, Kronick SL, Medlin RP, Ward KR. Demonstrating the consequences of learning missingness patterns in early warning systems for preventative health care: A novel simulation and solution. J Biomed Inform 2020;110:103528. [PMID: 32795506 DOI: 10.1016/j.jbi.2020.103528] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Revised: 05/20/2020] [Accepted: 08/03/2020] [Indexed: 01/04/2023]

Abstract

When using tree-based methods to develop predictive analytics and early warning systems for preventive healthcare, it is important to use an appropriate imputation method to prevent learning the missingness pattern. To demonstrate this, we developed a novel simulation that generated synthetic electronic health record data using a variational autoencoder with a custom loss function, which took into account the high missing rate of electronic health data. We showed that when tree-based methods learn missingness patterns (correlated with adverse events) in electronic health record data, this leads to decreased performance if the system is used in a new setting that has different missingness patterns. Performance is worst in this scenario when the missing rate between those with and without an adverse event is the greatest. We found that randomized and Bayesian regression imputation methods mitigate the issue of learning the missingness pattern for tree-based methods. We used this information to build a novel early warning system for predicting patient deterioration in general wards and telemetry units: PICTURE (Predicting Intensive Care Transfers and other UnfoReseen Events). To develop, tune, and test PICTURE, we used labs and vital signs from electronic health records of adult patients over four years (n = 133,089 encounters). We analyzed primary outcomes of unplanned intensive care unit transfer, emergency vasoactive medication administration, cardiac arrest, and death. We compared PICTURE with existing early warning systems and logistic regression at multiple levels of granularity. When analyzing PICTURE on the testing set using all observations within a hospital encounter (event rate = 3.4%), PICTURE had an area under the receiver operating characteristic curve (AUROC) of 0.83 and an adjusted (event rate = 4%) area under the precision-recall curve (AUPR) of 0.27, while the next best tested method-regularized logistic regression-had an AUROC of 0.80 and an adjusted AUPR of 0.22. To ensure system interpretability, we applied a state-of-the-art prediction explainer that provided a ranked list of features contributing most to the prediction. Though it is currently difficult to compare machine learning-based early warning systems, a rudimentary comparison with published scores demonstrated that PICTURE is on par with state-of-the-art machine learning systems. To facilitate more robust comparisons and development of early warning systems in the future, we have released our variational autoencoder's code and weights so researchers can (a) test their models on data similar to our institution and (b) make their own synthetic datasets.

Collapse

Rankin D, Black M, Bond R, Wallace J, Mulvenna M, Epelde G. Reliability of Supervised Machine Learning Using Synthetic Data in Health Care: Model to Preserve Privacy for Data Sharing. JMIR Med Inform 2020;8:e18910. [PMID: 32501278 PMCID: PMC7400044 DOI: 10.2196/18910] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 04/24/2020] [Accepted: 06/04/2020] [Indexed: 12/16/2022] Open

Abstract

BACKGROUND

The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce.

OBJECTIVE

This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data.

METHODS

A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed.

RESULTS

A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility.

CONCLUSIONS

The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.

Collapse