1
|
Ruiz-Mateos Serrano R, Farina D, Malliaras GG. Body Surface Potential Mapping: A Perspective on High-Density Cutaneous Electrophysiology. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024:e2411087. [PMID: 39679757 DOI: 10.1002/advs.202411087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/28/2024] [Indexed: 12/17/2024]
Abstract
The electrophysiological signals recorded by cutaneous electrodes, known as body surface potentials (BSPs), are widely employed biomarkers in medical diagnosis. Despite their widespread application and success in detecting various conditions, the poor spatial resolution of traditional BSP measurements poses a limit to their diagnostic potential. Advancements in the field of bioelectronics have facilitated the creation of compact, high-quality, high-density recording arrays for cutaneous electrophysiology, allowing detailed spatial information acquisition as BSP maps (BSPMs). Currently, the design of electrode arrays for BSP mapping lacks a standardized framework, leading to customizations for each clinical study, limiting comparability, reproducibility, and transferability. This perspective proposes preliminary design guidelines, drawn from existing literature, rooted solely in the physical properties of electrophysiological signals and mathematical principles of signal processing. These guidelines aim to simplify and generalize the optimization process for electrode array design, fostering more effective and applicable clinical research. Moreover, the increased spatial information obtained from BSPMs introduces interpretation challenges. To mitigate this, two strategies are outlined: observational transformations that reconstruct signal sources for intuitive comprehension, and machine learning-driven diagnostics. BSP mapping offers significant advantages in cutaneous electrophysiology with respect to classic electrophysiological recordings and is expected to expand into broader clinical domains in the future.
Collapse
Affiliation(s)
- Ruben Ruiz-Mateos Serrano
- Electrical Engineering Division, Department of Engineering, University of Cambridge, Cambridge, CB3 0FA, UK
| | - Dario Farina
- Department of Bioengineering, Faculty of Engineering, Imperial College London, London, W12 7TA, UK
| | - George G Malliaras
- Electrical Engineering Division, Department of Engineering, University of Cambridge, Cambridge, CB3 0FA, UK
| |
Collapse
|
2
|
Xu J, Hua Q, Jia X, Zheng Y, Hu Q, Bai B, Miao J, Zhu L, Zhang M, Tao R, Li Y, Luo T, Xie J, Zheng X, Gu P, Xing F, He C, Song Y, Dong Y, Xia S, Zhou J. Synthetic Breast Ultrasound Images: A Study to Overcome Medical Data Sharing Barriers. RESEARCH (WASHINGTON, D.C.) 2024; 7:0532. [PMID: 39628833 PMCID: PMC11612121 DOI: 10.34133/research.0532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2024] [Revised: 10/02/2024] [Accepted: 10/24/2024] [Indexed: 12/06/2024]
Abstract
The vast potential of medical big data to enhance healthcare outcomes remains underutilized due to privacy concerns, which restrict cross-center data sharing and the construction of diverse, large-scale datasets. To address this challenge, we developed a deep generative model aimed at synthesizing medical data to overcome data sharing barriers, with a focus on breast ultrasound (US) image synthesis. Specifically, we introduce CoLDiT, a conditional latent diffusion model with a transformer backbone, to generate US images of breast lesions across various Breast Imaging Reporting and Data System (BI-RADS) categories. Using a training dataset of 9,705 US images from 5,243 patients across 202 hospitals with diverse US systems, CoLDiT generated breast US images without duplicating private information, as confirmed through nearest-neighbor analysis. Blinded reader studies further validated the realism of these images, with area under the receiver operating characteristic curve (AUC) scores ranging from 0.53 to 0.77. Additionally, synthetic breast US images effectively augmented the training dataset for BI-RADS classification, achieving performance comparable to that using an equal-sized training set comprising solely real images (P = 0.81 for AUC). Our findings suggest that synthetic data, such as CoLDiT-generated images, offer a viable, privacy-preserving solution to facilitate secure medical data sharing and advance the utilization of medical big data.
Collapse
Affiliation(s)
- JiaLe Xu
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - Qing Hua
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - XiaoHong Jia
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - YuHang Zheng
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - Qiao Hu
- Department of Ultrasound,
The People’s Hospital of Guangxi Zhuang Autonomous Region, Nanning, 530021 Guangxi, China
| | - BaoYan Bai
- Department of Ultrasound,
Yan’an University Affiliated Hospital, Yan’an, 716000 Shaanxi, China
| | - Juan Miao
- Department of Ultrasound,
Zigong Fourth People’s Hospital, Zigong, 643000 Sichuan, China
| | - LiSha Zhu
- Department of Ultrasound,
Yichun City People’s Hospital, Yichun, 336000 Jiangxi, China
| | - MeiXiang Zhang
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - RuoLin Tao
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - YuHeng Li
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - Ting Luo
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - Jun Xie
- Shanghai Aitrox Technology Corporation Limited, 200050 Shanghai, China
| | - XueBin Zheng
- Shanghai Aitrox Technology Corporation Limited, 200050 Shanghai, China
| | - PengChen Gu
- Shanghai Aitrox Technology Corporation Limited, 200050 Shanghai, China
| | - FengYuan Xing
- Shanghai Aitrox Technology Corporation Limited, 200050 Shanghai, China
| | - Chuan He
- Shanghai Aitrox Technology Corporation Limited, 200050 Shanghai, China
| | - YanYan Song
- Department of Biostatistics, Institute of Medical Sciences,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - YiJie Dong
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - ShuJun Xia
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| | - JianQiao Zhou
- Department of Ultrasound, Ruijin Hospital,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
- College of Health Science and Technology,
Shanghai Jiao Tong University School of Medicine, 200025 Shanghai, China
| |
Collapse
|
3
|
Pati S, Kumar S, Varma A, Edwards B, Lu C, Qu L, Wang JJ, Lakshminarayanan A, Wang SH, Sheller MJ, Chang K, Singh P, Rubin DL, Kalpathy-Cramer J, Bakas S. Privacy preservation for federated learning in health care. PATTERNS (NEW YORK, N.Y.) 2024; 5:100974. [PMID: 39081567 PMCID: PMC11284498 DOI: 10.1016/j.patter.2024.100974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
Artificial intelligence (AI) shows potential to improve health care by leveraging data to build models that can inform clinical workflows. However, access to large quantities of diverse data is needed to develop robust generalizable models. Data sharing across institutions is not always feasible due to legal, security, and privacy concerns. Federated learning (FL) allows for multi-institutional training of AI models, obviating data sharing, albeit with different security and privacy concerns. Specifically, insights exchanged during FL can leak information about institutional data. In addition, FL can introduce issues when there is limited trust among the entities performing the compute. With the growing adoption of FL in health care, it is imperative to elucidate the potential risks. We thus summarize privacy-preserving FL literature in this work with special regard to health care. We draw attention to threats and review mitigation approaches. We anticipate this review to become a health-care researcher's guide to security and privacy in FL.
Collapse
Affiliation(s)
- Sarthak Pati
- Center for Federated Learning in Medicine, Indiana University, Indianapolis, IN, USA
- Division of Computational Pathology, Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Sourav Kumar
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA
| | - Amokh Varma
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA
| | | | - Charles Lu
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA
- Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women’s Hospital, Boston, MA, USA
| | - Liangqiong Qu
- Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong, China
| | - Justin J. Wang
- Department of Biomedical Data Science, Radiology, and Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
| | | | | | | | - Ken Chang
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Praveer Singh
- University of Colorado School of Medicine, Aurora, CO, USA
| | - Daniel L. Rubin
- Department of Biomedical Data Science, Radiology, and Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
| | | | - Spyridon Bakas
- Center for Federated Learning in Medicine, Indiana University, Indianapolis, IN, USA
- Division of Computational Pathology, Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Neurological Surgery, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University, Indianapolis, IN, USA
| |
Collapse
|
4
|
Chaudhry BM, Debi HR. User perceptions and experiences of an AI-driven conversational agent for mental health support. Mhealth 2024; 10:22. [PMID: 39114462 PMCID: PMC11304096 DOI: 10.21037/mhealth-23-55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 06/05/2024] [Indexed: 08/10/2024] Open
Abstract
Background The increasing prevalence of artificial intelligence (AI)-driven mental health conversational agents necessitates a comprehensive understanding of user engagement and user perceptions of this technology. This study aims to fill the existing knowledge gap by focusing on Wysa, a commercially available mobile conversational agent designed to provide personalized mental health support. Methods A total of 159 user reviews posted between January, 2020 and March, 2024, on the Wysa app's Google Play page were collected. Thematic analysis was then used to perform open and inductive coding of the collected data. Results Seven major themes emerged from the user reviews: "a trusting environment promotes wellbeing", "ubiquitous access offers real-time support", "AI limitations detract from the user experience", "perceived effectiveness of Wysa", "desire for cohesive and predictable interactions", "humanness in AI is welcomed", and "the need for improvements in the user interface". These themes highlight both the benefits and limitations of the AI-driven mental health conversational agents. Conclusions Users find that Wysa is effective in fostering a strong connection with its users, encouraging them to engage with the app and take positive steps towards emotional resilience and self-improvement. However, its AI needs several improvements to enhance user experience with the application. The findings contribute to the design and implementation of more effective, ethical, and user-aligned AI-driven mental health support systems.
Collapse
Affiliation(s)
- Beenish Moalla Chaudhry
- School of Computing and Informatics, Ray P. Authement College of Sciences, University of Louisiana at Lafayette, Lafayette, LA, USA
| | - Happy Rani Debi
- School of Computing and Informatics, Ray P. Authement College of Sciences, University of Louisiana at Lafayette, Lafayette, LA, USA
| |
Collapse
|
5
|
Brauneck A, Schmalhorst L, Weiss S, Baumbach L, Völker U, Ellinghaus D, Baumbach J, Buchholtz G. Legal aspects of privacy-enhancing technologies in genome-wide association studies and their impact on performance and feasibility. Genome Biol 2024; 25:154. [PMID: 38872191 PMCID: PMC11170858 DOI: 10.1186/s13059-024-03296-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 06/03/2024] [Indexed: 06/15/2024] Open
Abstract
Genomic data holds huge potential for medical progress but requires strict safety measures due to its sensitive nature to comply with data protection laws. This conflict is especially pronounced in genome-wide association studies (GWAS) which rely on vast amounts of genomic data to improve medical diagnoses. To ensure both their benefits and sufficient data security, we propose a federated approach in combination with privacy-enhancing technologies utilising the findings from a systematic review on federated learning and legal regulations in general and applying these to GWAS.
Collapse
Affiliation(s)
- Alissa Brauneck
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany.
| | - Louisa Schmalhorst
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany
| | - Stefan Weiss
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Linda Baumbach
- Department of Health Economics and Health Services Research, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Uwe Völker
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology (IKMB), Kiel University and University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Gabriele Buchholtz
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany
| |
Collapse
|
6
|
Chen H, Pang J, Zhao Y, Giddens S, Ficek J, Valente MJ, Cao B, Daley E. A data-driven approach to choosing privacy parameters for clinical trial data sharing under differential privacy. J Am Med Inform Assoc 2024; 31:1135-1143. [PMID: 38457282 PMCID: PMC11031247 DOI: 10.1093/jamia/ocae038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 01/27/2024] [Accepted: 02/16/2024] [Indexed: 03/10/2024] Open
Abstract
OBJECTIVES Clinical trial data sharing is crucial for promoting transparency and collaborative efforts in medical research. Differential privacy (DP) is a formal statistical technique for anonymizing shared data that balances privacy of individual records and accuracy of replicated results through a "privacy budget" parameter, ε. DP is considered the state of the art in privacy-protected data publication and is underutilized in clinical trial data sharing. This study is focused on identifying ε values for the sharing of clinical trial data. MATERIALS AND METHODS We analyzed 2 clinical trial datasets with privacy budget ε ranging from 0.01 to 10. Smaller values of ε entail adding greater amounts of random noise, with better privacy as a result. Comparison of rates, odds ratios, means, and mean differences between the original clinical trial datasets and the empirical distribution of the DP estimator was performed. RESULTS The DP rate closely approximated the original rate of 6.5% when ε > 1. The DP odds ratio closely aligned with the original odds ratio of 0.689 when ε ≥ 3. The DP mean closely approximated the original mean of 164.64 when ε ≥ 1. As ε increased to 5, both the minimum and maximum DP means converged toward the original mean. DISCUSSION There is no consensus on how to choose the privacy budget ε. The definition of DP does not specify the required level of privacy, and there is no established formula for determining ε. CONCLUSION Our findings suggest that the application of DP holds promise in the context of sharing clinical trial data.
Collapse
Affiliation(s)
- Henian Chen
- Study Design and Data Analysis, College of Public Health, University of South Florida, Tampa, FL 33612, United States
| | - Jinyong Pang
- Study Design and Data Analysis, College of Public Health, University of South Florida, Tampa, FL 33612, United States
| | - Yayi Zhao
- Study Design and Data Analysis, College of Public Health, University of South Florida, Tampa, FL 33612, United States
| | - Spencer Giddens
- Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN 46556, United States
| | - Joseph Ficek
- Oncology Statistics, GlaxoSmithKline, Collegeville, PA 19426, United States
| | - Matthew J Valente
- Study Design and Data Analysis, College of Public Health, University of South Florida, Tampa, FL 33612, United States
| | - Biwei Cao
- Study Design and Data Analysis, College of Public Health, University of South Florida, Tampa, FL 33612, United States
| | - Ellen Daley
- The Lawton and Rhea Chiles Center for Children and Families, College of Public Health, University of South Florida, Tampa, FL 33612, United States
| |
Collapse
|
7
|
Giuffrè M, Shung DL. Harnessing the power of synthetic data in healthcare: innovation, application, and privacy. NPJ Digit Med 2023; 6:186. [PMID: 37813960 PMCID: PMC10562365 DOI: 10.1038/s41746-023-00927-3] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 09/14/2023] [Indexed: 10/11/2023] Open
Abstract
Data-driven decision-making in modern healthcare underpins innovation and predictive analytics in public health and clinical research. Synthetic data has shown promise in finance and economics to improve risk assessment, portfolio optimization, and algorithmic trading. However, higher stakes, potential liabilities, and healthcare practitioner distrust make clinical use of synthetic data difficult. This paper explores the potential benefits and limitations of synthetic data in the healthcare analytics context. We begin with real-world healthcare applications of synthetic data that informs government policy, enhance data privacy, and augment datasets for predictive analytics. We then preview future applications of synthetic data in the emergent field of digital twin technology. We explore the issues of data quality and data bias in synthetic data, which can limit applicability across different applications in the clinical context, and privacy concerns stemming from data misuse and risk of re-identification. Finally, we evaluate the role of regulatory agencies in promoting transparency and accountability and propose strategies for risk mitigation such as Differential Privacy (DP) and a dataset chain of custody to maintain data integrity, traceability, and accountability. Synthetic data can improve healthcare, but measures to protect patient well-being and maintain ethical standards are key to promote responsible use.
Collapse
Affiliation(s)
- Mauro Giuffrè
- Department of Medicine (Digestive Diseases), Yale School of Medicine, Yale University, New Haven, CT, USA.
- Department of Medical, Surgical and Health Science, University of Trieste, Trieste, Italy.
| | - Dennis L Shung
- Department of Medicine (Digestive Diseases), Yale School of Medicine, Yale University, New Haven, CT, USA
| |
Collapse
|
8
|
Sperling K, Scherb H, Neitzel H. Population monitoring of trisomy 21: problems and approaches. Mol Cytogenet 2023; 16:6. [PMID: 37183244 PMCID: PMC10183086 DOI: 10.1186/s13039-023-00637-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 05/02/2023] [Indexed: 05/16/2023] Open
Abstract
Trisomy 21 (Down syndrome) is the most common autosomal aneuploidy among newborns. About 90% result from meiotic nondisjunction during oogenesis, which occurs around conception, when also the most profound epigenetic modifications take place. Thus, maternal meiosis is an error prone process with an extreme sensitivity to endogenous factors, as exemplified by maternal age. This contrasts with the missing acceptance of causal exogenous factors. The proof of an environmental agent is a great challenge, both with respect to ascertainment bias, determination of time and dosage of exposure, as well as registration of the relevant individual health data affecting the birth prevalence. Based on a few exemplary epidemiological studies the feasibility of trisomy 21 monitoring is illustrated. In the nearer future the methodical premises will be clearly improved, both due to the establishment of electronic health registers and to the introduction of non-invasive prenatal tests. Down syndrome is a sentinel phenotype, presumably also with regard to other congenital anomalies. Thus, monitoring of trisomy 21 offers new chances for risk avoidance and preventive measures, but also for basic research concerning identification of relevant genomic variants involved in chromosomal nondisjunction.
Collapse
Affiliation(s)
- Karl Sperling
- Institute of Medical and Human Genetics, Charité-Universitaetsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany.
| | - Hagen Scherb
- Institute of Computational Biology, Helmholtz Zentrum München - German Research Center for Environmental Health, Neuherberg, Germany
| | - Heidemarie Neitzel
- Institute of Medical and Human Genetics, Charité-Universitaetsmedizin Berlin, Augustenburger Platz 1, 13353, Berlin, Germany
| |
Collapse
|
9
|
Schmitt J, Bierbaum T, Geraedts M, Gothe H, Härter M, Hoffmann F, Ihle P, Kramer U, Klinkhammer-Schalke M, Kuske S, March S, Reese JP, Schoffer O, Swart E, Vollmar HC, Walther F, Hoffmann W. Das Gesundheitsdatennutzungsgesetz – Potenzial für
eine bessere Forschung und Gesundheitsversorgung. DAS GESUNDHEITSWESEN 2023; 85:215-222. [PMID: 36977473 PMCID: PMC10125338 DOI: 10.1055/a-2050-0429] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/30/2023]
Affiliation(s)
- Jochen Schmitt
- Zentrum für Evidenzbasierte Gesundheitsversorgung,
Universitätsklinikum Carl Gustav Carus an der Technischen
Universität Dresden, Dresden
- Deutsches Netzwerk Versorgungsforschung, Berlin
| | | | - Max Geraedts
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Institut für Versorgungsforschung und Klinische Epidemiologie,
Philipps-Universität Marburg, Marburg
| | - Holger Gothe
- Department für Public Health, Versorgungsforschung und Health
Technology Assessment, UMIT, Hall in Tirol, Austria
- Hochschule Hannover, Fakultät III, Abt. Information und
Kommunikation (IK), Hannover
- Lehrstuhl Gesundheitswissenschaften/Public Health, Medizinische
Fakultät Carl Gustav Carus, TU Dresden, Dresden
- Arbeitsgruppe Erhebung und Nutzung von Sekundärdaten
(AGENS)
| | - Martin Härter
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Institut und Poliklinik für Medizinische Psychologie,
Universitätsklinikum Hamburg-Eppendorf
- Ärztliches Zentrum für Qualität in der Medizin
(ÄZQ), Berlin
| | - Falk Hoffmann
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Department für Versorgungsforschung, Carl von Ossietzky
Universität Oldenburg, Oldenburg
| | - Peter Ihle
- Arbeitsgruppe Erhebung und Nutzung von Sekundärdaten
(AGENS)
- PMV forschungsgruppe, Medizinische Fakultät und
Universitätsklinikum Köln, Universität zu Köln,
Köln
| | - Ursula Kramer
- Deutsches Netzwerk Versorgungsforschung, Berlin
- sanawork Gesundheitskommunikation, Waldkirch
| | - Monika Klinkhammer-Schalke
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Tumorzentrum Regensburg, Zentrum für Qualitätssicherung
und Versorgungsforschung, Universität Regensburg,
Regensburg
| | - Silke Kuske
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Fliedner Fachhochschule Düsseldorf,
Düsseldorf
| | - Stefanie March
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Hochschule Magdeburg-Stendal, Fachbereich Soziale Arbeit, Gesundheit
und Medien
| | - Jens-Peter Reese
- Professur für Versorgungsforschung und Public Health Institut
für Klinische Epidemiologie und Biometrie
Julius-Maximilians-Universität Würzburg
| | - Olaf Schoffer
- Zentrum für Evidenzbasierte Gesundheitsversorgung,
Universitätsklinikum Carl Gustav Carus an der Technischen
Universität Dresden, Dresden
- Deutsches Netzwerk Versorgungsforschung, Berlin
| | - Enno Swart
- Arbeitsgruppe Erhebung und Nutzung von Sekundärdaten
(AGENS)
- Institut für Sozialmedizin und Gesundheitssystemforschung
(ISMG), Medizinische Fakultät, Otto-von-Guericke Universität
Magdeburg, Magdeburg
| | - Horst Christian Vollmar
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Abteilung für Allgemeinmedizin (AM RUB), Medizinische
Fakultät, Ruhr-Universität Bochum, Bochum
| | - Felix Walther
- Zentrum für Evidenzbasierte Gesundheitsversorgung,
Universitätsklinikum Carl Gustav Carus an der Technischen
Universität Dresden, Dresden
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Qualitäts- und Medizinisches Risikomanagement,
Universitätsklinikum Carl Gustav Carus an der Technischen
Universität Dresden, Dresden
| | - Wolfgang Hoffmann
- Deutsches Netzwerk Versorgungsforschung, Berlin
- Institut für Community Medicine, Universitätsmedizin
Greifswald, Greifswald
| |
Collapse
|
10
|
Artificial intelligence in uveitis: A comprehensive review. Surv Ophthalmol 2023:S0039-6257(23)00044-9. [PMID: 36878360 DOI: 10.1016/j.survophthal.2023.02.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 02/25/2023] [Accepted: 02/27/2023] [Indexed: 03/07/2023]
Abstract
Uveitis is a disease complex characterized by intraocular inflammation of the uvea that is an important cause of blindness and social morbidity. With the dawn of artificial intelligence (AI) and machine learning integration in healthcare, their application in uveitis creates an avenue to improve screening and diagnosis. Our review identified the use of artificial intelligence in studies of uveitis and classified them as diagnosis support, finding detection, screening, and standardization of uveitis nomenclature. The overall performance of models is poor, with limited datasets and a lack of validation studies and publicly available data and codes. We conclude that AI holds great promise to assist with the diagnosis and detection of ocular findings of uveitis, but further studies and large representative datasets are needed to guarantee generalizability and fairness.
Collapse
|
11
|
Federated machine learning in data-protection-compliant research. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00601-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
12
|
Elhussein A, Gürsoy G. Privacy-preserving patient clustering for personalized federated learning. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2023; 219:150-166. [PMID: 39239484 PMCID: PMC11376435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Federated Learning (FL) is a machine learning framework that enables multiple organizations to train a model without sharing their data with a central server. However, it experiences significant performance degradation if the data is non-identically independently distributed (non-IID). This is a problem in medical settings, where variations in the patient population contribute significantly to distribution differences across hospitals. Personalized FL addresses this issue by accounting for site-specific distribution differences. Clustered FL, a Personalized FL variant, was used to address this problem by clustering patients into groups across hospitals and training separate models on each group. However, privacy concerns remained as a challenge as the clustering process requires exchange of patient-level information. This was previously solved by forming clusters using aggregated data, which led to inaccurate groups and performance degradation. In this study, we propose Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel Clustered FL framework that can cluster patients using patient-level data while protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic technique, to securely calculate patient-level similarity scores across hospitals. We then evaluate PCBFL by training a federated mortality prediction model using 20 sites from the eICU dataset. We compare the performance gain from PCBFL against traditional and existing Clustered FL frameworks. Our results show that PCBFL successfully forms clinically meaningful cohorts of low, medium, and high-risk patients. PCBFL outperforms traditional and existing Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC improvement of 7.8%.
Collapse
Affiliation(s)
- Ahmed Elhussein
- Department of Biomedical Informatics, Columbia University, New York Genome Center, New York City, NY, U.S.A
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Department of Computer Science, Columbia University, New York Genome Center, New York City, NY, U.S.A
| |
Collapse
|
13
|
Gu T, Lee PH, Duan R. COMMUTE: Communication-efficient transfer learning for multi-site risk prediction. J Biomed Inform 2023; 137:104243. [PMID: 36403757 PMCID: PMC9868117 DOI: 10.1016/j.jbi.2022.104243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Revised: 09/20/2022] [Accepted: 11/06/2022] [Indexed: 11/19/2022]
Abstract
OBJECTIVES We propose a communication-efficient transfer learning approach (COMMUTE) that effectively incorporates multi-site healthcare data for training a risk prediction model in a target population of interest, accounting for challenges including population heterogeneity and data sharing constraints across sites. METHODS We first train population-specific source models locally within each site. Using data from a given target population, COMMUTE learns a calibration term for each source model, which adjusts for potential data heterogeneity through flexible distance-based regularizations. In a centralized setting where multi-site data can be directly pooled, all data are combined to train the target model after calibration. When individual-level data are not shareable in some sites, COMMUTE requests only the locally trained models from these sites, with which, COMMUTE generates heterogeneity-adjusted synthetic data for training the target model. We evaluate COMMUTE via extensive simulation studies and an application to multi-site data from the electronic Medical Records and Genomics (eMERGE) Network to predict extreme obesity. RESULTS Simulation studies show that COMMUTE outperforms methods without adjusting for population heterogeneity and methods trained in a single population over a broad spectrum of settings. Using eMERGE data, COMMUTE achieves an area under the receiver operating characteristic curve (AUC) around 0.80, which outperforms other benchmark methods with AUC ranging from 0.51 to 0.70. CONCLUSION COMMUTE improves the risk prediction in a target population with limited samples and safeguards against negative transfer when some source populations are highly different from the target. In a federated setting, it is highly communication efficient as it only requires each site to share model parameter estimates once, and no iterative communication or higher-order terms are needed.
Collapse
Affiliation(s)
- Tian Gu
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Phil H Lee
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States; Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States; Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Rui Duan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
| |
Collapse
|
14
|
Kurz CF, König AN, Emmert‐Fees KMF, Allen LD. The effect of differential privacy on Medicaid participation among racial and ethnic minority groups. Health Serv Res 2022; 57 Suppl 2:207-213. [PMID: 35524543 PMCID: PMC9660420 DOI: 10.1111/1475-6773.14000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
OBJECTIVE To investigate how county and state-level estimates of Medicaid enrollment among the total, non-Hispanic White, non-Hispanic Black or African American, and Hispanic or Latino/a population are affected by Differential Privacy (DP), where statistical noise is added to the public decennial US census data to protect individual privacy. DATA SOURCES We obtained population counts from the final version of the US Census Bureau Differential Privacy Demonstration Products from 2010 and combined them with Medicaid enrollment data. STUDY DESIGN We compared 2010 county and state-level population counts released under the traditional disclosure avoidance techniques and the ones produced with the proposed DP procedures. DATA COLLECTION/EXTRACTION METHODS Not applicable. PRINCIPAL FINDINGS We find the DP method introduces errors up to 10% into counts and proportions of Medicaid participation rate accuracy at the county level, especially for small subpopulations and racial and ethnic minority groups. The effect of DP on Medicaid participation rate accuracy is only small and negligible at the state level. CONCLUSIONS The implementation of DP in the 2020 census can affect the analyses of health disparities and health care access and use among different subpopulations in the United States. The planned implementation of DP in other census-related surveys such as the American Community Survey can misrepresent Medicaid participation rates for small racial and ethnic minority groups. This can affect Medicaid funding decisions.
Collapse
Affiliation(s)
- Christoph F. Kurz
- Munich School of Management and Munich Center of Health SciencesLudwig‐Maximilians‐Universität MünchenMunichGermany,Institute of Health Economics and Health Care ManagementHelmholtz Zentrum MünchenNeuherbergGermany
| | - Adriana N. König
- Munich School of Management and Munich Center of Health SciencesLudwig‐Maximilians‐Universität MünchenMunichGermany,Institute of Health Economics and Health Care ManagementHelmholtz Zentrum MünchenNeuherbergGermany
| | - Karl M. F. Emmert‐Fees
- Institute of Health Economics and Health Care ManagementHelmholtz Zentrum MünchenNeuherbergGermany,Department of Sport and Health SciencesTechnical University of MunichMunichGermany
| | - Lindsay D. Allen
- Department of Emergency Medicine, Feinberg School of MedicineNorthwestern UniversityChicagoIllinoisUSA
| |
Collapse
|
15
|
Halfpenny W, Baxter SL. Towards effective data sharing in ophthalmology: data standardization and data privacy. Curr Opin Ophthalmol 2022; 33:418-424. [PMID: 35819893 PMCID: PMC9357189 DOI: 10.1097/icu.0000000000000878] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
PURPOSE OF REVIEW The purpose of this review is to provide an overview of updates in data standardization and data privacy in ophthalmology. These topics represent two key aspects of medical information sharing and are important knowledge areas given trends in data-driven healthcare. RECENT FINDINGS Standardization and privacy can be seen as complementary aspects that pertain to data sharing. Standardization promotes the ease and efficacy through which data is shared. Privacy considerations ensure that data sharing is appropriate and sufficiently controlled. There is active development in both areas, including government regulations and common data models to advance standardization, and application of technologies such as blockchain and synthetic data to help tackle privacy issues. These advancements have seen use in ophthalmology, but there are areas where further work is required. SUMMARY Information sharing is fundamental to both research and care delivery, and standardization/privacy are key constituent considerations. Therefore, widespread engagement with, and development of, data standardization and privacy ecosystems stand to offer great benefit to ophthalmology.
Collapse
Affiliation(s)
| | - Sally L. Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA
- Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
16
|
Thambawita V, Salehi P, Sheshkal SA, Hicks SA, Hammer HL, Parasa S, de Lange T, Halvorsen P, Riegler MA. SinGAN-Seg: Synthetic training data generation for medical image segmentation. PLoS One 2022; 17:e0267976. [PMID: 35500005 PMCID: PMC9060378 DOI: 10.1371/journal.pone.0267976] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 04/19/2022] [Indexed: 12/20/2022] Open
Abstract
Analyzing medical data to find abnormalities is a time-consuming and costly task, particularly for rare abnormalities, requiring tremendous efforts from medical experts. Therefore, artificial intelligence has become a popular tool for the automatic processing of medical data, acting as a supportive tool for doctors. However, the machine learning models used to build these tools are highly dependent on the data used to train them. Large amounts of data can be difficult to obtain in medicine due to privacy reasons, expensive and time-consuming annotations, and a general lack of data samples for infrequent lesions. In this study, we present a novel synthetic data generation pipeline, called SinGAN-Seg, to produce synthetic medical images with corresponding masks using a single training image. Our method is different from the traditional generative adversarial networks (GANs) because our model needs only a single image and the corresponding ground truth to train. We also show that the synthetic data generation pipeline can be used to produce alternative artificial segmentation datasets with corresponding ground truth masks when real datasets are not allowed to share. The pipeline is evaluated using qualitative and quantitative comparisons between real data and synthetic data to show that the style transfer technique used in our pipeline significantly improves the quality of the generated data and our method is better than other state-of-the-art GANs to prepare synthetic images when the size of training datasets are limited. By training UNet++ using both real data and the synthetic data generated from the SinGAN-Seg pipeline, we show that the models trained on synthetic data have very close performances to those trained on real data when both datasets have a considerable amount of training data. In contrast, we show that synthetic data generated from the SinGAN-Seg pipeline improves the performance of segmentation models when training datasets do not have a considerable amount of data. All experiments were performed using an open dataset and the code is publicly available on GitHub.
Collapse
Affiliation(s)
- Vajira Thambawita
- SimulaMet, Oslo, Norway
- Oslo Metropolitan University, Oslo, Norway
- * E-mail:
| | | | | | | | - Hugo L. Hammer
- SimulaMet, Oslo, Norway
- Oslo Metropolitan University, Oslo, Norway
| | - Sravanthi Parasa
- Department of Gastroenterology, Swedish Medical Group, Seattle, WA, United States of America
| | - Thomas de Lange
- Medical Department, Sahlgrenska University Hospital-Möndal, Gothenburg, Sweden
- Department of Molecular and Clinical Medicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
- Augere Medical, Oslo, Norway
| | - Pål Halvorsen
- SimulaMet, Oslo, Norway
- Oslo Metropolitan University, Oslo, Norway
| | | |
Collapse
|
17
|
Zhang Z, Yan C, Malin BA. Membership inference attacks against synthetic health data. J Biomed Inform 2022; 125:103977. [PMID: 34920126 PMCID: PMC8766950 DOI: 10.1016/j.jbi.2021.103977] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/17/2021] [Accepted: 12/08/2021] [Indexed: 01/03/2023]
Abstract
Synthetic data generation has emerged as a promising method to protect patient privacy while sharing individual-level health data. Intuitively, sharing synthetic data should reduce disclosure risks because no explicit linkage is retained between the synthetic records and the real data upon which it is based. However, the risks associated with synthetic data are still evolving, and what seems protected today may not be tomorrow. In this paper, we show that membership inference attacks, whereby an adversary infers if the data from certain target individuals (known to the adversary a priori) were relied upon by the synthetic data generation process, can be substantially enhanced through state-of-the-art machine learning frameworks, which calls into question the protective nature of existing synthetic data generators. Specifically, we formulate the membership inference problem from the perspective of the data holder, who aims to perform a disclosure risk assessment prior to sharing any health data. To support such an assessment, we introduce a framework for effective membership inference against synthetic health data without specific assumptions about the generative model or a well-defined data structure, leveraging the principles of contrastive representation learning. To illustrate the potential for such an attack, we conducted experiments against synthesis approaches using two datasets derived from several health data resources (Vanderbilt University Medical Center, the All of Us Research Program) to determine the upper bound of risk brought by an adversary who invokes an optimal strategy. The results indicate that partially synthetic data are vulnerable to membership inference at a very high rate. By contrast, fully synthetic data are only marginally susceptible and, in most cases, could be deemed sufficiently protected from membership inference.
Collapse
Affiliation(s)
- Ziqi Zhang
- Vanderbilt University, 2525 West End Avenue, Nashville, TN 37240,Corresponding author: (Ziqi Zhang)
| | - Chao Yan
- Vanderbilt University, 2525 West End Avenue, Nashville, TN 37240
| | - Bradley A. Malin
- Vanderbilt University, 2525 West End Avenue, Nashville, TN 37240,Vanderbilt University Medical Center, 2525 West End Avenue, Nashville, TN 37240
| |
Collapse
|
18
|
Bakken S. Climate change, security, privacy, and data sharing: Important areas for advocacy and informatics solutions. J Am Med Inform Assoc 2021; 28:2072-2073. [PMID: 34536285 DOI: 10.1093/jamia/ocab188] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 08/19/2021] [Accepted: 08/19/2021] [Indexed: 11/12/2022] Open
Affiliation(s)
- Suzanne Bakken
- School of Nursing, Department of Biomedical Informatics, and Data Science Institute, Columbia University, New York, New York, USA
| |
Collapse
|