1
|
Düsing C, Cimiano P, Rehberg S, Scherer C, Kaup O, Köster C, Hellmich S, Herrmann D, Meier KL, Claßen S, Borgstedt R. Integrating federated learning for improved counterfactual explanations in clinical decision support systems for sepsis therapy. Artif Intell Med 2024; 157:102982. [PMID: 39277983 DOI: 10.1016/j.artmed.2024.102982] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Revised: 08/29/2024] [Accepted: 09/01/2024] [Indexed: 09/17/2024]
Abstract
In recent years, we have witnessed both artificial intelligence obtaining remarkable results in clinical decision support systems (CDSSs) and explainable artificial intelligence (XAI) improving the interpretability of these models. In turn, this fosters the adoption by medical personnel and improves trustworthiness of CDSSs. Among others, counterfactual explanations prove to be one such XAI technique particularly suitable for the healthcare domain due to its ease of interpretation, even for less technically proficient staff. However, the generation of high-quality counterfactuals relies on generative models for guidance. Unfortunately, training such models requires a huge amount of data that is beyond the means of ordinary hospitals. In this paper, we therefore propose to use federated learning to allow multiple hospitals to jointly train such generative models while maintaining full data privacy. We demonstrate the superiority of our approach compared to locally generated counterfactuals. Moreover, we prove that generative models for counterfactual generation that are trained using federated learning in a suitable environment perform only marginally worse compared to centrally trained ones while offering the benefit of data privacy preservation. Finally, we integrate our method into a prototypical CDSS for treatment recommendation for sepsis patients, thus providing a proof of concept for real-world application as well as insights and sanity checks from clinical application.
Collapse
Affiliation(s)
- Christoph Düsing
- Center for Cognitive Interaction Technology, Bielefeld University, Inspiration 1, Bielefeld, 33619, Germany.
| | - Philipp Cimiano
- Center for Cognitive Interaction Technology, Bielefeld University, Inspiration 1, Bielefeld, 33619, Germany.
| | - Sebastian Rehberg
- Department of Anaesthesiology, Intensive Care, Emergency Medicine, Transfusion Medicine and Pain Therapy, University Hospital OWL, Campus Bielefeld-Bethel, Protestant Hospital of the Bethel Foundation, Burgsteig 13, Bielefeld, 33617, Germany.
| | - Christiane Scherer
- Institute of Laboratory Medicine, Microbiology and Hygiene, University Hospital OWL, Campus Bielefeld-Bethel, Protestant Hospital of the Bethel Foundation, Burgsteig 13, Bielefeld, 33617, Germany.
| | - Olaf Kaup
- Institute of Laboratory Medicine, Microbiology and Transfusion Medicine, University Hospital OWL, Campus Bielefeld Hospital, Teutoburger Straße 50, Bielefeld, 33604, Germany.
| | - Christiane Köster
- University Clinic for Cardiology and Internal Intensive Care Medicine, University Hospital OWL, Campus Bielefeld Hospital, Teutoburger Straße 50, Bielefeld, 33604, Germany.
| | - Stefan Hellmich
- Department of Anesthesiology, Surgical Intensive Care Medicine, Emergency Medicine and Pain Therapy, University Hospital OWL, Campus Bielefeld Hospital, Teutoburger Straße 50, Bielefeld, 33604, Germany.
| | - Daniel Herrmann
- Department of Anesthesiology, Surgical Intensive Care Medicine, Emergency Medicine and Pain Therapy, University Hospital OWL, Campus Bielefeld Hospital, Teutoburger Straße 50, Bielefeld, 33604, Germany.
| | - Kirsten Laura Meier
- Department of Anesthesiology, Surgical Intensive Care Medicine, Emergency Medicine and Pain Therapy, University Hospital OWL, Campus Bielefeld Hospital, Teutoburger Straße 50, Bielefeld, 33604, Germany.
| | - Simon Claßen
- Department of Anesthesiology, Surgical Intensive Care Medicine, Emergency Medicine and Pain Therapy, University Hospital OWL, Campus Bielefeld Hospital, Teutoburger Straße 50, Bielefeld, 33604, Germany.
| | - Rainer Borgstedt
- Department of Anaesthesiology, Intensive Care, Emergency Medicine, Transfusion Medicine and Pain Therapy, University Hospital OWL, Campus Bielefeld-Bethel, Protestant Hospital of the Bethel Foundation, Burgsteig 13, Bielefeld, 33617, Germany.
| |
Collapse
|
2
|
Guerreiro J, Garriga R, Lozano Bagén T, Sharma B, Karnik NS, Matić A. Transatlantic transferability and replicability of machine-learning algorithms to predict mental health crises. NPJ Digit Med 2024; 7:227. [PMID: 39251868 PMCID: PMC11384787 DOI: 10.1038/s41746-024-01203-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 07/29/2024] [Indexed: 09/11/2024] Open
Abstract
Transferring and replicating predictive algorithms across healthcare systems constitutes a unique yet crucial challenge that needs to be addressed to enable the widespread adoption of machine learning in healthcare. In this study, we explored the impact of important differences across healthcare systems and the associated Electronic Health Records (EHRs) on machine-learning algorithms to predict mental health crises, up to 28 days in advance. We evaluated both the transferability and replicability of such machine learning models, and for this purpose, we trained six models using features and methods developed on EHR data from the Birmingham and Solihull Mental Health NHS Foundation Trust in the UK. These machine learning models were then used to predict the mental health crises of 2907 patients seen at the Rush University System for Health in the US between 2018 and 2020. The best one was trained on a combination of US-specific structured features and frequency features from anonymized patient notes and achieved an AUROC of 0.837. A model with comparable performance, originally trained using UK structured data, was transferred and then tuned using US data, achieving an AUROC of 0.826. Our findings establish the feasibility of transferring and replicating machine learning models to predict mental health crises across diverse hospital systems.
Collapse
Affiliation(s)
| | - Roger Garriga
- Koa Health, Barcelona, Spain
- Universitat Pompeu Fabra, Department of Information and Communication Technologies, Barcelona, Spain
| | | | | | | | | |
Collapse
|
3
|
Slotman DJ, Bartels LW, Nijholt IM, Huirne JAF, Moonen CTW, Boomsma MF. Development and validation of a deep learning-based method for automatic measurement of uterus, fibroid, and ablated volume in MRI after MR-HIFU treatment of uterine fibroids. Eur J Radiol 2024; 178:111602. [PMID: 38991285 DOI: 10.1016/j.ejrad.2024.111602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 06/21/2024] [Accepted: 07/02/2024] [Indexed: 07/13/2024]
Abstract
INTRODUCTION The non-perfused volume divided by total fibroid load (NPV/TFL) is a predictive outcome parameter for MRI-guided high-intensity focused ultrasound (MR-HIFU) treatments of uterine fibroids, which is related to long-term symptom relief. In current clinical practice, the MR-HIFU outcome parameters are typically determined by visual inspection, so an automated computer-aided method could facilitate objective outcome quantification. The objective of this study was to develop and evaluate a deep learning-based segmentation algorithm for volume measurements of the uterus, uterine fibroids, and NPVs in MRI in order to automatically quantify the NPV/TFL. MATERIALS AND METHODS A segmentation pipeline was developed and evaluated using expert manual segmentations of MRI scans of 115 uterine fibroid patients, screened for and/or undergoing MR-HIFU treatment. The pipeline contained three separate neural networks, one per target structure. The first step in the pipeline was uterus segmentation from contrast-enhanced (CE)-T1w scans. This segmentation was subsequently used to remove non-uterus background tissue for NPV and fibroid segmentation. In the following step, NPVs were segmented from uterus-only CE-T1w scans. Finally, fibroids were segmented from uterus-only T2w scans. The segmentations were used to calculate the volume for each structure. Reliability and agreement between manual and automatic segmentations, volumes, and NPV/TFLs were assessed. RESULTS For treatment scans, the Dice similarity coefficients (DSC) between the manually and automatically obtained segmentations were 0.90 (uterus), 0.84 (NPV) and 0.74 (fibroid). Intraclass correlation coefficients (ICC) were 1.00 [0.99, 1.00] (uterus), 0.99 [0.98, 1.00] (NPV) and 0.98 [0.95, 0.99] (fibroid) between manually and automatically derived volumes. For manually and automatically derived NPV/TFLs, the mean difference was 5% [-41%, 51%] (ICC: 0.66 [0.32, 0.85]). CONCLUSION The algorithm presented in this study automatically calculates uterus volume, fibroid load, and NPVs, which could lead to more objective outcome quantification after MR-HIFU treatments of uterine fibroids in comparison to visual inspection. When robustness has been ascertained in a future study, this tool may eventually be employed in clinical practice to automatically measure the NPV/TFL after MR-HIFU procedures of uterine fibroids.
Collapse
Affiliation(s)
- Derk J Slotman
- Department of Radiology, Isala, Zwolle, the Netherlands; Imaging & Oncology Division, University Medical Center Utrecht, Utrecht, the Netherlands.
| | - Lambertus W Bartels
- Imaging & Oncology Division, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Ingrid M Nijholt
- Department of Radiology, Isala, Zwolle, the Netherlands; Imaging & Oncology Division, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Judith A F Huirne
- Department of Obstetrics and Gynaecology, Amsterdam UMC, Amsterdam, the Netherlands; Amsterdam Reproduction and Development, Amsterdam, the Netherlands
| | - Chrit T W Moonen
- Imaging & Oncology Division, University Medical Center Utrecht, Utrecht, the Netherlands; Focused Ultrasound Foundation, Charlottesville, VA, United States of America
| | - Martijn F Boomsma
- Department of Radiology, Isala, Zwolle, the Netherlands; Imaging & Oncology Division, University Medical Center Utrecht, Utrecht, the Netherlands
| |
Collapse
|
4
|
Pati S, Kumar S, Varma A, Edwards B, Lu C, Qu L, Wang JJ, Lakshminarayanan A, Wang SH, Sheller MJ, Chang K, Singh P, Rubin DL, Kalpathy-Cramer J, Bakas S. Privacy preservation for federated learning in health care. PATTERNS (NEW YORK, N.Y.) 2024; 5:100974. [PMID: 39081567 PMCID: PMC11284498 DOI: 10.1016/j.patter.2024.100974] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/02/2024]
Abstract
Artificial intelligence (AI) shows potential to improve health care by leveraging data to build models that can inform clinical workflows. However, access to large quantities of diverse data is needed to develop robust generalizable models. Data sharing across institutions is not always feasible due to legal, security, and privacy concerns. Federated learning (FL) allows for multi-institutional training of AI models, obviating data sharing, albeit with different security and privacy concerns. Specifically, insights exchanged during FL can leak information about institutional data. In addition, FL can introduce issues when there is limited trust among the entities performing the compute. With the growing adoption of FL in health care, it is imperative to elucidate the potential risks. We thus summarize privacy-preserving FL literature in this work with special regard to health care. We draw attention to threats and review mitigation approaches. We anticipate this review to become a health-care researcher's guide to security and privacy in FL.
Collapse
Affiliation(s)
- Sarthak Pati
- Center for Federated Learning in Medicine, Indiana University, Indianapolis, IN, USA
- Division of Computational Pathology, Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Sourav Kumar
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA
| | - Amokh Varma
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA
| | | | - Charles Lu
- Department of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Boston, MA, USA
- Center for Clinical Data Science, Massachusetts General Hospital and Brigham and Women’s Hospital, Boston, MA, USA
| | - Liangqiong Qu
- Department of Statistics and Actuarial Science, University of Hong Kong, Hong Kong, China
| | - Justin J. Wang
- Department of Biomedical Data Science, Radiology, and Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
| | | | | | | | - Ken Chang
- Department of Radiology, Stanford University, Stanford, CA, USA
| | - Praveer Singh
- University of Colorado School of Medicine, Aurora, CO, USA
| | - Daniel L. Rubin
- Department of Biomedical Data Science, Radiology, and Medicine (Biomedical Informatics), Stanford University, Stanford, CA, USA
| | | | - Spyridon Bakas
- Center for Federated Learning in Medicine, Indiana University, Indianapolis, IN, USA
- Division of Computational Pathology, Department of Pathology and Laboratory Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Radiology and Imaging Sciences, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Neurological Surgery, Indiana University School of Medicine, Indianapolis, IN, USA
- Department of Computer Science, Luddy School of Informatics, Computing, and Engineering, Indiana University, Indianapolis, IN, USA
| |
Collapse
|
5
|
González-García J, González-Galindo J, Estupiñán-Romero F, Thißen M, Lyons RA, Telleria-Orriols C, Bernal-Delgado E. PHIRI: lessons for an extensive reuse of sensitive data in federated health research. Eur J Public Health 2024; 34:i43-i49. [PMID: 38946447 PMCID: PMC11215320 DOI: 10.1093/eurpub/ckae036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2024] Open
Abstract
BACKGROUND The extensive and continuous reuse of sensitive health data could enhance the role of population health research on public decisions. This paper describes the design principles and the different building blocks that have supported the implementation and deployment of Population Health Information Research Infrastructure (PHIRI), the strengths and challenges of the approach and some future developments. METHODS The design and implementation of PHIRI have been developed upon: (i) the data visiting principle-data does not move but code moves; (ii) the orchestration of the research question throughout a workflow that ensured legal, organizational, semantic and technological interoperability and (iii) a 'master-worker' federated computational architecture that supported the development of four uses cases. RESULTS Nine participants nodes and 28 Euro-Peristat members completed the deployment of the infrastructure according to the expected outputs. As a consequence, each use case produced and published their own common data model, the analytical pipeline and the corresponding research outputs. All the digital objects were developed and published according to Open Science and FAIR principles. CONCLUSION PHIRI has successfully supported the development of four use cases in a federated manner, overcoming limitations for the reuse of sensitive health data and providing a methodology to achieve interoperability in multiple research nodes.
Collapse
Affiliation(s)
- Juan González-García
- Data Sciences for Health Services and Policy Research, Institute for Health Sciences in Aragón (IACS), Zaragoza, Spain
| | - Javier González-Galindo
- Data Sciences for Health Services and Policy Research, Institute for Health Sciences in Aragón (IACS), Zaragoza, Spain
| | - Francisco Estupiñán-Romero
- Data Sciences for Health Services and Policy Research, Institute for Health Sciences in Aragón (IACS), Zaragoza, Spain
| | - Martin Thißen
- Department of Epidemiology and Health Monitoring, Robert Koch Institute, Berlin, Germany
| | - Ronan A Lyons
- Population Data Science, Swansea University Medical School, Faculty of Medicine, Health, and Life Science, Swansea University, Swansea, Swansea, UK
| | - Carlos Telleria-Orriols
- Data Sciences for Health Services and Policy Research, Institute for Health Sciences in Aragón (IACS), Zaragoza, Spain
| | - Enrique Bernal-Delgado
- Data Sciences for Health Services and Policy Research, Institute for Health Sciences in Aragón (IACS), Zaragoza, Spain
| |
Collapse
|
6
|
Guan H, Yap PT, Bozoki A, Liu M. Federated learning for medical image analysis: A survey. PATTERN RECOGNITION 2024; 151:110424. [PMID: 38559674 PMCID: PMC10976951 DOI: 10.1016/j.patcog.2024.110424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Machine learning in medical imaging often faces a fundamental dilemma, namely, the small sample size problem. Many recent studies suggest using multi-domain data pooled from different acquisition sites/centers to improve statistical power. However, medical images from different sites cannot be easily shared to build large datasets for model training due to privacy protection reasons. As a promising solution, federated learning, which enables collaborative training of machine learning models based on data from different sites without cross-site data sharing, has attracted considerable attention recently. In this paper, we conduct a comprehensive survey of the recent development of federated learning methods in medical image analysis. We have systematically gathered research papers on federated learning and its applications in medical image analysis published between 2017 and 2023. Our search and compilation were conducted using databases from IEEE Xplore, ACM Digital Library, Science Direct, Springer Link, Web of Science, Google Scholar, and PubMed. In this survey, we first introduce the background of federated learning for dealing with privacy protection and collaborative learning issues. We then present a comprehensive review of recent advances in federated learning methods for medical image analysis. Specifically, existing methods are categorized based on three critical aspects of a federated learning system, including client end, server end, and communication techniques. In each category, we summarize the existing federated learning methods according to specific research problems in medical image analysis and also provide insights into the motivations of different approaches. In addition, we provide a review of existing benchmark medical imaging datasets and software platforms for current federated learning research. We also conduct an experimental study to empirically evaluate typical federated learning methods for medical image analysis. This survey can help to better understand the current research status, challenges, and potential research opportunities in this promising research field.
Collapse
Affiliation(s)
- Hao Guan
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Pew-Thian Yap
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Andrea Bozoki
- Department of Neurology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mingxia Liu
- Department of Radiology and Biomedical Research Imaging Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
7
|
Guo K, Chen T, Ren S, Li N, Hu M, Kang J. Federated Learning Empowered Real-Time Medical Data Processing Method for Smart Healthcare. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:869-879. [PMID: 35737631 DOI: 10.1109/tcbb.2022.3185395] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Computer-aided diagnosis (CAD) has always been an important research topic for applying artificial intelligence in smart healthcare. Sufficient medical data are one of the most critical factors in CAD research. However, medical data are usually obtained in chronological order and cannot be collected all at once, which poses difficulties for the application of deep learning technology in the medical field. The traditional batch learning method consumes considerable time and space resources for real-time medical data, and the incremental learning method often leads to catastrophic forgetting. To solve these problems, we propose a real-time medical data processing method based on federated learning. We divide the process into the model stage and the exemplar stage. In the model stage, we use the federated learning method to fuse the old and new models to mitigate the catastrophic forgetting problem of the new model. In the exemplar stage, we use the most representative exemplars selected from the old data to help the new model review the old knowledge, which further mitigates the catastrophic forgetting problem of the new model. We use this method to conduct experiments on a simulated medical real-time data stream. The experimental results show that our method can learn a disease diagnosis model from a continuous medical real-time data stream. As the amount of data increases, the performance of the disease diagnosis model continues to improve, and the catastrophic forgetting problem has been effectively mitigated. Compared with the traditional batch learning method, our method can significantly save time and space resources.
Collapse
|
8
|
Chai H, Huang Y, Xu L, Song X, He M, Wang Q. A decentralized federated learning-based cancer survival prediction method with privacy protection. Heliyon 2024; 10:e31873. [PMID: 38845954 PMCID: PMC11153246 DOI: 10.1016/j.heliyon.2024.e31873] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 05/18/2024] [Accepted: 05/23/2024] [Indexed: 06/09/2024] Open
Abstract
Background Survival prediction is one of the crucial goals in precision medicine, as accurate survival assessment can aid physicians in selecting appropriate treatment for individual patients. To achieve this aim, extensive data must be utilized to train the prediction model and prevent overfitting. However, the collection of patient data for disease prediction is challenging due to potential variations in data sources across institutions and concerns regarding privacy and ownership issues in data sharing. To facilitate the integration of cancer data from different institutions without violating privacy laws, we developed a federated learning-based data integration framework called AdFed, which can be used to evaluate patients' survival while considering the privacy protection problem by utilizing the decentralized federated learning technology and regularization method. Results AdFed was tested on different cancer datasets that contain the patients' information from different institutions. The experimental results show that AdFed using distributed data can achieve better performance in cancer survival prediction (AUC = 0.605) than the compared federated-learning-based methods (average AUC = 0.554). Additionally, to assess the biological interpretability of our method, in the case study we list 10 identified genes related to liver cancer selected by AdFed, among which 5 genes have been proved by literature review. Conclusions The results indicate that AdFed outperforms better than other federated-learning-based methods, and the interpretable algorithm can select biologically significant genes and pathways while ensuring the confidentiality and integrity of data.
Collapse
Affiliation(s)
- Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Yiqian Huang
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Lekai Xu
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Xinpeng Song
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan, 528000, China
| | - Qingyong Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
- Anhui Provincial Engineering Research Center for Agricultural Information Perception and Intelligent Computing, Hefei, 230036, China
| |
Collapse
|
9
|
Zhang F, Kreuter D, Chen Y, Dittmer S, Tull S, Shadbahr T, Preller J, Rudd JH, Aston JA, Schönlieb CB, Gleadall N, Roberts M. Recent methodological advances in federated learning for healthcare. PATTERNS (NEW YORK, N.Y.) 2024; 5:101006. [PMID: 39005485 PMCID: PMC11240178 DOI: 10.1016/j.patter.2024.101006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
For healthcare datasets, it is often impossible to combine data samples from multiple sites due to ethical, privacy, or logistical concerns. Federated learning allows for the utilization of powerful machine learning algorithms without requiring the pooling of data. Healthcare data have many simultaneous challenges, such as highly siloed data, class imbalance, missing data, distribution shifts, and non-standardized variables, that require new methodologies to address. Federated learning adds significant methodological complexity to conventional centralized machine learning, requiring distributed optimization, communication between nodes, aggregation of models, and redistribution of models. In this systematic review, we consider all papers on Scopus published between January 2015 and February 2023 that describe new federated learning methodologies for addressing challenges with healthcare data. We reviewed 89 papers meeting these criteria. Significant systemic issues were identified throughout the literature, compromising many methodologies reviewed. We give detailed recommendations to help improve methodology development for federated learning in healthcare.
Collapse
Affiliation(s)
- Fan Zhang
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Daniel Kreuter
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Yichen Chen
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Sören Dittmer
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- ZeTeM, University of Bremen, Bremen, Germany
| | - Samuel Tull
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | - Tolou Shadbahr
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Jacobus Preller
- Addenbrooke’s Hospital, Cambridge University Hospitals NHS Trust, Cambridge, UK
| | - James H.F. Rudd
- Department of Medicine, University of Cambridge, Cambridge, UK
| | - John A.D. Aston
- Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK
| | - Carola-Bibiane Schönlieb
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
| | | | - Michael Roberts
- Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
- Department of Medicine, University of Cambridge, Cambridge, UK
| |
Collapse
|
10
|
Seoni S, Shahini A, Meiburger KM, Marzola F, Rotunno G, Acharya UR, Molinari F, Salvi M. All you need is data preparation: A systematic review of image harmonization techniques in Multi-center/device studies for medical support systems. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108200. [PMID: 38677080 DOI: 10.1016/j.cmpb.2024.108200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/20/2024] [Accepted: 04/22/2024] [Indexed: 04/29/2024]
Abstract
BACKGROUND AND OBJECTIVES Artificial intelligence (AI) models trained on multi-centric and multi-device studies can provide more robust insights and research findings compared to single-center studies. However, variability in acquisition protocols and equipment can introduce inconsistencies that hamper the effective pooling of multi-source datasets. This systematic review evaluates strategies for image harmonization, which standardizes appearances to enable reliable AI analysis of multi-source medical imaging. METHODS A literature search using PRISMA guidelines was conducted to identify relevant papers published between 2013 and 2023 analyzing multi-centric and multi-device medical imaging studies that utilized image harmonization approaches. RESULTS Common image harmonization techniques included grayscale normalization (improving classification accuracy by up to 24.42 %), resampling (increasing the percentage of robust radiomics features from 59.5 % to 89.25 %), and color normalization (enhancing AUC by up to 0.25 in external test sets). Initially, mathematical and statistical methods dominated, but machine and deep learning adoption has risen recently. Color imaging modalities like digital pathology and dermatology have remained prominent application areas, though harmonization efforts have expanded to diverse fields including radiology, nuclear medicine, and ultrasound imaging. In all the modalities covered by this review, image harmonization improved AI performance, with increasing of up to 24.42 % in classification accuracy and 47 % in segmentation Dice scores. CONCLUSIONS Continued progress in image harmonization represents a promising strategy for advancing healthcare by enabling large-scale, reliable analysis of integrated multi-source datasets using AI. Standardizing imaging data across clinical settings can help realize personalized, evidence-based care supported by data-driven technologies while mitigating biases associated with specific populations or acquisition protocols.
Collapse
Affiliation(s)
- Silvia Seoni
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Alen Shahini
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Kristen M Meiburger
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Francesco Marzola
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Giulia Rotunno
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - U Rajendra Acharya
- School of Mathematics, Physics and Computing, University of Southern Queensland, Springfield, Australia; Centre for Health Research, University of Southern Queensland, Australia
| | - Filippo Molinari
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy
| | - Massimo Salvi
- Biolab, PolitoBIOMedLab, Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy.
| |
Collapse
|
11
|
Xiao T, Kong S, Zhang Z, Hua D, Liu F. A review of big data technology and its application in cancer care. Comput Biol Med 2024; 176:108577. [PMID: 38739981 DOI: 10.1016/j.compbiomed.2024.108577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 05/07/2024] [Accepted: 05/07/2024] [Indexed: 05/16/2024]
Abstract
The development of modern medical devices and information technology has led to a rapid growth in the amount of data available for health protection information, with the concept of medical big data emerging globally, along with significant advances in cancer care relying on data-driven approaches. However, outstanding issues such as fragmented data governance, low-quality data specification, and data lock-in still make sharing challenging. Big data technology provides solutions for managing massive heterogeneous data while combining artificial intelligence (AI) techniques such as machine learning (ML) and deep learning (DL) to better mine the intrinsic connections between data. This paper surveys and organizes recent articles on big data technology and its applications in cancer, dividing them into three different types to outline their primary content and summarize their critical role in assisting cancer care. It then examines the latest research directions in big data technology in cancer and evaluates the current state of development of each type of application. Finally, current challenges and opportunities are discussed, and recommendations are made for the further integration of big data technology into the medical industry in the future.
Collapse
Affiliation(s)
- Tianyun Xiao
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, 063210, China; The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, Hebei, 063210, China; College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China
| | - Shanshan Kong
- College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China.
| | - Zichen Zhang
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, 063210, China; The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, Hebei, 063210, China; College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China
| | - Dianbo Hua
- Beijing Sitairui Cancer Data Analysis Joint Laboratory, Beijing, 101149, China
| | - Fengchun Liu
- Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan, Hebei, 063210, China; The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan, Hebei, 063210, China; College of Science, North China University of Science and Technology, Tangshan, Hebei, 063210, China; Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan, Hebei, China; Tangshan Intelligent Industry and Image Processing Technology Innovation Center, North China University of Science and Technology, Tangshan, Hebei, China
| |
Collapse
|
12
|
Sadeghi S, Hempel L, Rodemund N, Kirsten T. Salzburg Intensive Care database (SICdb): a detailed exploration and comparative analysis with MIMIC-IV. Sci Rep 2024; 14:11438. [PMID: 38763952 PMCID: PMC11102905 DOI: 10.1038/s41598-024-61380-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 05/06/2024] [Indexed: 05/21/2024] Open
Abstract
The utilization of artificial intelligence (AI) in healthcare is on the rise, demanding increased accessibility to (public) medical data for benchmarking. The digitization of healthcare in recent years has facilitated medical data scientists' access to extensive hospital data, fostering AI-based research. A notable addition to this trend is the Salzburg Intensive Care database (SICdb), made publicly available in early 2023. Covering over 27 thousand intensive care admissions at the University Hospital Salzburg from 2013 to 2021, this dataset presents a valuable resource for AI-driven investigations. This article explores the SICdb and conducts a comparative analysis with the widely recognized Medical Information Mart for Intensive Care - version IV (MIMIC-IV) database. The comparison focuses on key aspects, emphasizing the availability and granularity of data provided by the SICdb, particularly vital signs and laboratory measurements. The analysis demonstrates that the SICdb offers more detailed information with higher data availability and temporal resolution for signal data, especially for vital signs, compared to the MIMIC-IV. This is advantageous for longitudinal studies of patients' health conditions in the intensive care unit. The SICdb provides a valuable resource for medical data scientists and researchers. The database offers comprehensive and diverse healthcare data in a European country, making it well suited for benchmarking and enhancing AI-based healthcare research. The importance of ongoing efforts to expand and make public datasets available for advancing AI applications in the healthcare domain is emphasized by the findings.
Collapse
Affiliation(s)
- Sina Sadeghi
- Department for Medical Data Science, Leipzig University Medical Center, Leipzig, Germany.
- Institute for Medical Informatics, Statistics and Epidemiology, Leipzig University, Leipzig, Germany.
| | - Lars Hempel
- Department for Medical Data Science, Leipzig University Medical Center, Leipzig, Germany
- Institute for Medical Informatics, Statistics and Epidemiology, Leipzig University, Leipzig, Germany
- Faculty Applied Computer and Bio Sciences, Mittweida University of Applied Sciences, Mittweida, Germany
| | - Niklas Rodemund
- Department of Anaesthesiology, Perioperative Medicine and Intensive Care Medicine, Paracelsus Medical University of Salzburg, Salzburg, Austria
| | - Toralf Kirsten
- Department for Medical Data Science, Leipzig University Medical Center, Leipzig, Germany
- Institute for Medical Informatics, Statistics and Epidemiology, Leipzig University, Leipzig, Germany
- Faculty Applied Computer and Bio Sciences, Mittweida University of Applied Sciences, Mittweida, Germany
| |
Collapse
|
13
|
Babar M, Qureshi B, Koubaa A. Investigating the impact of data heterogeneity on the performance of federated learning algorithm using medical imaging. PLoS One 2024; 19:e0302539. [PMID: 38748657 PMCID: PMC11095741 DOI: 10.1371/journal.pone.0302539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/09/2024] [Indexed: 05/19/2024] Open
Abstract
In recent years, Federated Learning (FL) has gained traction as a privacy-centric approach in medical imaging. This study explores the challenges posed by data heterogeneity on FL algorithms, using the COVIDx CXR-3 dataset as a case study. We contrast the performance of the Federated Averaging (FedAvg) algorithm on non-identically and independently distributed (non-IID) data against identically and independently distributed (IID) data. Our findings reveal a notable performance decline with increased data heterogeneity, emphasizing the need for innovative strategies to enhance FL in diverse environments. This research contributes to the practical implementation of FL, extending beyond theoretical concepts and addressing the nuances in medical imaging applications. This research uncovers the inherent challenges in FL due to data diversity. It sets the stage for future advancements in FL strategies to effectively manage data heterogeneity, especially in sensitive fields like healthcare.
Collapse
Affiliation(s)
- Muhammad Babar
- Robotics and Internet of Things Lab, Prince Sultan University, Riyadh, Saudi Arabia
| | - Basit Qureshi
- College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia
| | - Anis Koubaa
- Robotics and Internet of Things Lab, Prince Sultan University, Riyadh, Saudi Arabia
| |
Collapse
|
14
|
Nguyen TPV, Yang W, Tang Z, Xia X, Mullens AB, Dean JA, Li Y. Lightweight federated learning for STIs/HIV prediction. Sci Rep 2024; 14:6560. [PMID: 38503789 PMCID: PMC10950866 DOI: 10.1038/s41598-024-56115-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 03/01/2024] [Indexed: 03/21/2024] Open
Abstract
This paper presents a solution that prioritises high privacy protection and improves communication throughput for predicting the risk of sexually transmissible infections/human immunodeficiency virus (STIs/HIV). The approach utilised Federated Learning (FL) to construct a model from multiple clinics and key stakeholders. FL ensured that only models were shared between clinics, minimising the risk of personal information leakage. Additionally, an algorithm was explored on the FL manager side to construct a global model that aligns with the communication status of the system. Our proposed method introduced Random Forest Federated Learning for assessing the risk of STIs/HIV, incorporating a flexible aggregation process that can be adjusted to accommodate the capacious communication system. Experimental results demonstrated the significant potential of a solution for estimating STIs/HIV risk. In comparison with recent studies, our approach yielded superior results in terms of AUC (0.97) and accuracy ( 93 % ). Despite these promising findings, a limitation of the study lies in the experiment for man's data, due to the self-reported nature of the data and sensitive content. which may be subject to participant bias. Future research could check the performance of the proposed framework in partnership with high-risk populations (e.g., men who have sex with men) to provide a more comprehensive understanding of the proposed framework's impact and ultimately aim to improve health outcomes/health service optimisation.
Collapse
Affiliation(s)
- Thi Phuoc Van Nguyen
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, Toowoomba, 4350, QLD, Australia.
| | - Wencheng Yang
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, Toowoomba, 4350, QLD, Australia
| | - Zhaohui Tang
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, Toowoomba, 4350, QLD, Australia
| | - Xiaoyu Xia
- School of Computing Technologies, RMIT University, GPO Box 2476, Melbourne, 3001, VIC, Australia
| | - Amy B Mullens
- School of Psychology and Wellbeing, Institute for Resilient Regions, Centre for Health Research, University of Southern Queensland, Ipswich Campus, Ipswich, 4305, Australia
| | - Judith A Dean
- School of Public Health, Faculty of Medicine, The University of Queensland, Herston Road, Brisbane, 4006, QLD, Australia
| | - Yan Li
- School of Mathematics, Physics and Computing, Centre for Health Research, University of Southern Queensland, Toowoomba Campus, Toowoomba, 4350, QLD, Australia
| |
Collapse
|
15
|
Tripathy SS, Bebortta S, Chowdhary CL, Mukherjee T, Kim S, Shafi J, Ijaz MF. FedHealthFog: A federated learning-enabled approach towards healthcare analytics over fog computing platform. Heliyon 2024; 10:e26416. [PMID: 38468957 PMCID: PMC10925998 DOI: 10.1016/j.heliyon.2024.e26416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 01/13/2024] [Accepted: 02/13/2024] [Indexed: 03/13/2024] Open
Abstract
The emergence of federated learning (FL) technique in fog-enabled healthcare system has leveraged enhanced privacy towards safeguarding sensitive patient information over heterogeneous computing platforms. In this paper, we introduce the FedHealthFog framework, which was meticulously developed to overcome the difficulties of distributed learning in resource-constrained IoT-enabled healthcare systems, particularly those sensitive to delays and energy efficiency. Conventional federated learning approaches face challenges stemming from substantial compute requirements and significant communication costs. This is primarily due to their reliance on a singular server for the aggregation of global data, which results in inefficient training models. We present a transformational approach to address these problems by elevating strategically placed fog nodes to the position of local aggregators within the federated learning architecture. A sophisticated greedy heuristic technique is used to optimize the choice of a fog node as the global aggregator in each communication cycle between edge devices and the cloud. The FedHealthFog system notably accounts for drop in communication latency of 87.01%, 26.90%, and 71.74%, and energy consumption of 57.98%, 34.36%, and 35.37% respectively, for three benchmark algorithms analyzed in this study. The effectiveness of FedHealthFog is strongly supported by outcomes of our experiments compared to cutting-edge alternatives while simultaneously reducing number of global aggregation cycles. These findings highlight FedHealthFog's potential to transform federated learning in resource-constrained IoT environments for delay-sensitive applications.
Collapse
Affiliation(s)
| | - Sujit Bebortta
- Department of Computer Science, Ravenshaw University, Cuttack, 753003, India
| | - Chiranji Lal Chowdhary
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore, India
| | - Tanmay Mukherjee
- Department of Computer Science and Engineering, Siksha ‘O' Anusandhan (Deemed to be) University, Bhubaneswar, 751030, India
| | - SeongKi Kim
- Department of Computer Engineering, Chosun University, Gwangju 61452, South Korea
| | - Jana Shafi
- Department of Computer Engineering and Information, College of Engineering in Wadi Alddawasir, Prince Sattam Bin Abdulaziz University, Wadi Alddawasir, 11991, Saudi Arabia
| | - Muhammad Fazal Ijaz
- School of IT and Engineering, Melbourne Institute of Technology, Melbourne, 3000, Australia
| |
Collapse
|
16
|
Danek BP, Makarious MB, Dadu A, Vitale D, Lee PS, Singleton AB, Nalls MA, Sun J, Faghri F. Federated learning for multi-omics: A performance evaluation in Parkinson's disease. PATTERNS (NEW YORK, N.Y.) 2024; 5:100945. [PMID: 38487808 PMCID: PMC10935499 DOI: 10.1016/j.patter.2024.100945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/29/2024] [Accepted: 02/02/2024] [Indexed: 03/17/2024]
Abstract
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open-source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.
Collapse
Affiliation(s)
- Benjamin P. Danek
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
- DataTecnica, Washington, DC 20037, USA
| | - Mary B. Makarious
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
| | - Anant Dadu
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
- DataTecnica, Washington, DC 20037, USA
| | - Dan Vitale
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
- DataTecnica, Washington, DC 20037, USA
| | - Paul Suhwan Lee
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | - Andrew B. Singleton
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mike A. Nalls
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
- DataTecnica, Washington, DC 20037, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jimeng Sun
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
| | - Faraz Faghri
- Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
- DataTecnica, Washington, DC 20037, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
17
|
Liu J, Peng C, Tan W, Shi C. Federated Learning Backdoor Attack Based on Frequency Domain Injection. ENTROPY (BASEL, SWITZERLAND) 2024; 26:164. [PMID: 38392419 PMCID: PMC10888216 DOI: 10.3390/e26020164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 02/07/2024] [Accepted: 02/10/2024] [Indexed: 02/24/2024]
Abstract
Federated learning (FL) is a distributed machine learning framework that enables scattered participants to collaboratively train machine learning models without revealing information to other participants. Due to its distributed nature, FL is susceptible to being manipulated by malicious clients. These malicious clients can launch backdoor attacks by contaminating local data or tampering with local model gradients, thereby damaging the global model. However, existing backdoor attacks in distributed scenarios have several vulnerabilities. For example, (1) the triggers in distributed backdoor attacks are mostly visible and easily perceivable by humans; (2) these triggers are mostly applied in the spatial domain, inevitably corrupting the semantic information of the contaminated pixels. To address these issues, this paper introduces a frequency-domain injection-based backdoor attack in FL. Specifically, by performing a Fourier transform, the trigger and the clean image are linearly mixed in the frequency domain, injecting the low-frequency information of the trigger into the clean image while preserving its semantic information. Experiments on multiple image classification datasets demonstrate that the attack method proposed in this paper is stealthier and more effective in FL scenarios compared to existing attack methods.
Collapse
Affiliation(s)
- Jiawang Liu
- State Key Laboratory of Public Big Data, College of Compute Science and Technology, Guizhou University, Guiyang 550025, China
| | - Changgen Peng
- State Key Laboratory of Public Big Data, College of Compute Science and Technology, Guizhou University, Guiyang 550025, China
| | - Weijie Tan
- State Key Laboratory of Public Big Data, College of Compute Science and Technology, Guizhou University, Guiyang 550025, China
- Key Laboratory of Advanced Manufacturing Technology of Ministry of Education, Guizhou University, Guiyang 550025, China
| | - Chenghui Shi
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
18
|
Danek B, Makarious MB, Dadu A, Vitale D, Lee PS, Nalls MA, Sun J, Faghri F. Federated Learning for multi-omics: a performance evaluation in Parkinson's disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.04.560604. [PMID: 37986893 PMCID: PMC10659429 DOI: 10.1101/2023.10.04.560604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
While machine learning (ML) research has recently grown more in popularity, its application in the omics domain is constrained by access to sufficiently large, high-quality datasets needed to train ML models. Federated Learning (FL) represents an opportunity to enable collaborative curation of such datasets among participating institutions. We compare the simulated performance of several models trained using FL against classically trained ML models on the task of multi-omics Parkinson's Disease prediction. We find that FL model performance tracks centrally trained ML models, where the most performant FL model achieves an AUC-PR of 0.876 ± 0.009, 0.014 ± 0.003 less than its centrally trained variation. We also determine that the dispersion of samples within a federation plays a meaningful role in model performance. Our study implements several open source FL frameworks and aims to highlight some of the challenges and opportunities when applying these collaborative methods in multi-omics studies.
Collapse
Affiliation(s)
- Benjamin Danek
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- DataTecnica, Washington, DC, 20037, USA
| | - Mary B. Makarious
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London, UK
- UCL Movement Disorders Centre, University College London, London, UK
| | - Anant Dadu
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- DataTecnica, Washington, DC, 20037, USA
| | - Dan Vitale
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- DataTecnica, Washington, DC, 20037, USA
| | - Paul Suhwan Lee
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Mike A Nalls
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- DataTecnica, Washington, DC, 20037, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Jimeng Sun
- Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Champaign, IL, 61820, USA
| | - Faraz Faghri
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
- DataTecnica, Washington, DC, 20037, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, 20892, USA
| |
Collapse
|
19
|
Pezoulas VC, Kalatzis F, Exarchos TP, Goules A, Tzioufas AG, Fotiadis DI. FHBF: Federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets. PATTERNS (NEW YORK, N.Y.) 2024; 5:100893. [PMID: 38264722 PMCID: PMC10801222 DOI: 10.1016/j.patter.2023.100893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 11/03/2023] [Accepted: 11/10/2023] [Indexed: 01/25/2024]
Abstract
Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).
Collapse
Affiliation(s)
- Vasileios C. Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
| | - Fanis Kalatzis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
| | - Themis P. Exarchos
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
- Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Andreas Goules
- Department of Pathophysiology, Faculty of Medicine, National and Kapodistrian University of Athens (NKUA), 15772 Athens, Greece
| | - Athanasios G. Tzioufas
- Department of Pathophysiology, Faculty of Medicine, National and Kapodistrian University of Athens (NKUA), 15772 Athens, Greece
| | - Dimitrios I. Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
- Biomedical Research Institute, FORTH, 45110 Ioannina, Greece
| |
Collapse
|
20
|
Ghadi YY, Mazhar T, Shah SFA, Haq I, Ahmad W, Ouahada K, Hamam H. Integration of federated learning with IoT for smart cities applications, challenges, and solutions. PeerJ Comput Sci 2023; 9:e1657. [PMID: 38192447 PMCID: PMC10773731 DOI: 10.7717/peerj-cs.1657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 09/29/2023] [Indexed: 01/10/2024]
Abstract
In the past few years, privacy concerns have grown, making the financial models of businesses more vulnerable to attack. In many cases, it is hard to emphasize the importance of monitoring things in real-time with data from Internet of Things (IoT) devices. The people who make the IoT devices and those who use them face big problems when they try to use Artificial Intelligence (AI) techniques in real-world applications, where data must be collected and processed at a central location. Federated learning (FL) has made a decentralized, cooperative AI system that can be used by many IoT apps that use AI. It is possible because it can train AI on IoT devices that are spread out and do not need to share data. FL allows local models to be trained on local data and share their knowledge to improve a global model. Also, shared learning allows models from all over the world to be trained using data from all over the world. This article looks at the IoT in all of its forms, including "smart" businesses, "smart" cities, "smart" transportation, and "smart" healthcare. This study looks at the safety problems that the federated learning with IoT (FL-IoT) area has brought to market. This research is needed to explore because federated learning is a new technique, and a small amount of work is done on challenges faced during integration with IoT. This research also helps in the real world in such applications where encrypted data must be sent from one place to another. Researchers and graduate students are the audience of our article.
Collapse
Affiliation(s)
- Yazeed Yasin Ghadi
- Department of Computer Science and Software Engineering, Al Ain University, Abu Dhabi, UAE
| | - Tehseen Mazhar
- Department of Computer Science, Virtual University of Pakistan, Lahore, Punjab, Pakistan
| | - Syed Faisal Abbas Shah
- Department of Computer Science, Virtual University of Pakistan, Lahore, Punjab, Pakistan
| | - Inayatul Haq
- School of Electrical and Information Engineering, Zhengzhou University, Zhengzhou, Henan, China
| | - Wasim Ahmad
- Department of Computer Science and Information Technology, University of Malakand, Chakdara, Dir, Pakistan
| | - Khmaies Ouahada
- School of Electrical Engineering, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
| | - Habib Hamam
- School of Electrical Engineering, Department of Electrical and Electronic Engineering Science, University of Johannesburg, Johannesburg, South Africa
- Commune d'Akanda, International Institute of Technology and Management, BP Libreville, Estuaire, Gabon
- Faculty of Engineering, University of Moncton, Moncton, New Brunswick, Canada
- College of Computer Science and Engineering, University of Ha'il, Ha'il, Saudi Arabia
- Production & Skills Development, Spectrum of Knowledge Production & Skills Development, Sfax, Tunisia
| |
Collapse
|
21
|
Sharma S, Guleria K. A comprehensive review on federated learning based models for healthcare applications. Artif Intell Med 2023; 146:102691. [PMID: 38042608 DOI: 10.1016/j.artmed.2023.102691] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 10/22/2023] [Accepted: 10/22/2023] [Indexed: 12/04/2023]
Abstract
A disease is an abnormal condition that negatively impacts the functioning of the human body. Pathology determines the causes behind the disease and identifies its development mechanism and functional consequences. Each disease has different identification methods, including X-ray scans for pneumonia, covid-19, and lung cancer, whereas biopsy and CT-scan can identify the presence of skin cancer and Alzheimer's disease, respectively. Early disease detection leads to effective treatment and avoids abiding complications. Deep learning has provided a vast number of applications in medical sectors resulting in accurate and reliable early disease predictions. These models are utilized in the healthcare industry to provide supplementary assistance to doctors in identifying the presence of diseases. Majorly, these models are trained through secondary data sources since healthcare institutions refrain from sharing patients' private data to ensure confidentiality, which limits the effectiveness of deep learning models due to the requirement of extensive datasets for training to achieve optimal results. Federated learning deals with the data in such a way that it doesn't exploit the privacy of a patient's data. In this work, a wide variety of disease detection models trained through federated learning have been rigorously reviewed. This meta-analysis provides an in-depth review of the federated learning architectures, federated learning types, hyperparameters, dataset utilization details, aggregation techniques, performance measures, and augmentation methods applied in the existing models during the development phase. The review also highlights various open challenges associated with the disease detection models trained through federated learning for future research.
Collapse
Affiliation(s)
- Shagun Sharma
- Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura 140401, Punjab, India
| | - Kalpna Guleria
- Chitkara University Institute of Engineering & Technology, Chitkara University, Rajpura 140401, Punjab, India.
| |
Collapse
|
22
|
Li S, Liu P, Nascimento GG, Wang X, Leite FRM, Chakraborty B, Hong C, Ning Y, Xie F, Teo ZL, Ting DSW, Haddadi H, Ong MEH, Peres MA, Liu N. Federated and distributed learning applications for electronic health records and structured medical data: a scoping review. J Am Med Inform Assoc 2023; 30:2041-2049. [PMID: 37639629 PMCID: PMC10654866 DOI: 10.1093/jamia/ocad170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/19/2023] [Indexed: 08/31/2023] Open
Abstract
OBJECTIVES Federated learning (FL) has gained popularity in clinical research in recent years to facilitate privacy-preserving collaboration. Structured data, one of the most prevalent forms of clinical data, has experienced significant growth in volume concurrently, notably with the widespread adoption of electronic health records in clinical practice. This review examines FL applications on structured medical data, identifies contemporary limitations, and discusses potential innovations. MATERIALS AND METHODS We searched 5 databases, SCOPUS, MEDLINE, Web of Science, Embase, and CINAHL, to identify articles that applied FL to structured medical data and reported results following the PRISMA guidelines. Each selected publication was evaluated from 3 primary perspectives, including data quality, modeling strategies, and FL frameworks. RESULTS Out of the 1193 papers screened, 34 met the inclusion criteria, with each article consisting of one or more studies that used FL to handle structured clinical/medical data. Of these, 24 utilized data acquired from electronic health records, with clinical predictions and association studies being the most common clinical research tasks that FL was applied to. Only one article exclusively explored the vertical FL setting, while the remaining 33 explored the horizontal FL setting, with only 14 discussing comparisons between single-site (local) and FL (global) analysis. CONCLUSIONS The existing FL applications on structured medical data lack sufficient evaluations of clinically meaningful benefits, particularly when compared to single-site analyses. Therefore, it is crucial for future FL applications to prioritize clinical motivations and develop designs and methodologies that can effectively support and aid clinical practice and research.
Collapse
Affiliation(s)
- Siqi Li
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Pinyan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Gustavo G Nascimento
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Xinru Wang
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Fabio Renato Manzolli Leite
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Bibhas Chakraborty
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Department of Statistics and Data Science, National University of Singapore, Singapore 117546, Singapore
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, United States
| | - Chuan Hong
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27708, United States
| | - Yilin Ning
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Feng Xie
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Zhen Ling Teo
- Singapore National Eye Centre, Singapore, Singapore Eye Research Institute, Singapore 168751, Singapore
| | - Daniel Shu Wei Ting
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Singapore National Eye Centre, Singapore, Singapore Eye Research Institute, Singapore 168751, Singapore
| | - Hamed Haddadi
- Department of Computing, Imperial College London, London SW7 2AZ, England, United Kingdom
| | - Marcus Eng Hock Ong
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Department of Emergency Medicine, Singapore General Hospital, Singapore 169608, Singapore
| | - Marco Aurélio Peres
- National Dental Research Institute Singapore, National Dental Centre Singapore, Singapore 168938, Singapore
- Oral Health Academic Clinical Programme, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
| | - Nan Liu
- Centre for Quantitative Medicine, Duke-NUS Medical School, Singapore 169857, Singapore
- Programme in Health Services and Systems Research, Duke-NUS Medical School, Singapore 169857, Singapore
- Institute of Data Science, National University of Singapore, Singapore 117602, Singapore
| |
Collapse
|
23
|
Meskó B. The Impact of Multimodal Large Language Models on Health Care's Future. J Med Internet Res 2023; 25:e52865. [PMID: 37917126 PMCID: PMC10654899 DOI: 10.2196/52865] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 10/10/2023] [Accepted: 10/12/2023] [Indexed: 11/03/2023] Open
Abstract
When large language models (LLMs) were introduced to the public at large in late 2022 with ChatGPT (OpenAI), the interest was unprecedented, with more than 1 billion unique users within 90 days. Until the introduction of Generative Pre-trained Transformer 4 (GPT-4) in March 2023, these LLMs only contained a single mode-text. As medicine is a multimodal discipline, the potential future versions of LLMs that can handle multimodality-meaning that they could interpret and generate not only text but also images, videos, sound, and even comprehensive documents-can be conceptualized as a significant evolution in the field of artificial intelligence (AI). This paper zooms in on the new potential of generative AI, a new form of AI that also includes tools such as LLMs, through the achievement of multimodal inputs of text, images, and speech on health care's future. We present several futuristic scenarios to illustrate the potential path forward as multimodal LLMs (M-LLMs) could represent the gateway between health care professionals and using AI for medical purposes. It is important to point out, though, that despite the unprecedented potential of generative AI in the form of M-LLMs, the human touch in medicine remains irreplaceable. AI should be seen as a tool that can augment health care professionals rather than replace them. It is also important to consider the human aspects of health care-empathy, understanding, and the doctor-patient relationship-when deploying AI.
Collapse
|
24
|
Subashchandrabose U, John R, Anbazhagu UV, Venkatesan VK, Thyluru Ramakrishna M. Ensemble Federated Learning Approach for Diagnostics of Multi-Order Lung Cancer. Diagnostics (Basel) 2023; 13:3053. [PMID: 37835796 PMCID: PMC10572651 DOI: 10.3390/diagnostics13193053] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2023] [Revised: 09/20/2023] [Accepted: 09/24/2023] [Indexed: 10/15/2023] Open
Abstract
The early detection and classification of lung cancer is crucial for improving a patient's outcome. However, the traditional classification methods are based on single machine learning models. Hence, this is limited by the availability and quality of data at the centralized computing server. In this paper, we propose an ensemble Federated Learning-based approach for multi-order lung cancer classification. This approach combines multiple machine learning models trained on different datasets allowing for improvising accuracy and generalization. Moreover, the Federated Learning approach enables the use of distributed data while ensuring data privacy and security. We evaluate the approach on a Kaggle cancer dataset and compare the results with traditional machine learning models. The results demonstrate an accuracy of 89.63% with lung cancer classification.
Collapse
Affiliation(s)
| | - Rajan John
- Department of Computer Science, College of Computer Science and Information Technology, Jazan University, Jazan 45142, Saudi Arabia;
| | - Usha Veerasamy Anbazhagu
- Department of Computing Technologies, School of Computing, Faculty of Engineering and Technology, SRM Institute of Science and Technology, SRM Nagar, Kattankulathur, Chennai 603203, India;
| | - Vinoth Kumar Venkatesan
- School of Computer Science Engineering and Information Systems, Vellore Institute of Technology, Vellore 632014, India
| | - Mahesh Thyluru Ramakrishna
- Department of Computer Science and Engineering, Faculty of Engineering and Technology, JAIN (Deemed-to-Be University), Bangalore 560066, India
| |
Collapse
|
25
|
Wang Q, He M, Guo L, Chai H. AFEI: adaptive optimized vertical federated learning for heterogeneous multi-omics data integration. Brief Bioinform 2023; 24:bbad269. [PMID: 37497720 DOI: 10.1093/bib/bbad269] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 06/26/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023] Open
Abstract
Vertical federated learning has gained popularity as a means of enabling collaboration and information sharing between different entities while maintaining data privacy and security. This approach has potential applications in disease healthcare, cancer prognosis prediction, and other industries where data privacy is a major concern. Although using multi-omics data for cancer prognosis prediction provides more information for treatment selection, collecting different types of omics data can be challenging due to their production in various medical institutions. Data owners must comply with strict data protection regulations such as European Union (EU) General Data Protection Regulation. To share patient data across multiple institutions, privacy and security issues must be addressed. Therefore, we propose an adaptive optimized vertical federated-learning-based framework adaptive optimized vertical federated learning for heterogeneous multi-omics data integration (AFEI) to integrate multi-omics data collected from multiple institutions for cancer prognosis prediction. AFEI enables participating parties to build an accurate joint evaluation model for learning more information related to cancer patients from different perspectives, based on the distributed and encrypted multi-omics features shared by multiple institutions. The experimental results demonstrate that AFEI achieves higher prediction accuracy (6.5% on average) than using single omics data by utilizing the encrypted multi-omics data from different institutions, and it performs almost as well as prognosis prediction by directly integrating multi-omics data. Overall, AFEI can be seen as an efficient solution for breaking down barriers to multi-institutional collaboration and promoting the development of cancer prognosis prediction.
Collapse
Affiliation(s)
- Qingyong Wang
- School of Information and Computer, Anhui Agricultural University, Hefei 230000, China
| | - Minfan He
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| | - Longyi Guo
- Guangdong Provincial Hospital of Traditional Chinese Medical, Guangzhou 510000, China
| | - Hua Chai
- School of Mathematics and Big Data, Foshan University, Foshan 528000, China
| |
Collapse
|
26
|
Khajehali N, Yan J, Chow YW, Fahmideh M. A Comprehensive Overview of IoT-Based Federated Learning: Focusing on Client Selection Methods. SENSORS (BASEL, SWITZERLAND) 2023; 23:7235. [PMID: 37631771 PMCID: PMC10459674 DOI: 10.3390/s23167235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 08/12/2023] [Accepted: 08/14/2023] [Indexed: 08/27/2023]
Abstract
The integration of the Internet of Things (IoT) with machine learning (ML) is revolutionizing how services and applications impact our daily lives. In traditional ML methods, data are collected and processed centrally. However, modern IoT networks face challenges in implementing this approach due to their vast amount of data and privacy concerns. To overcome these issues, federated learning (FL) has emerged as a solution. FL allows ML methods to achieve collaborative training by transferring model parameters instead of client data. One of the significant challenges of federated learning is that IoT devices as clients usually have different computation and communication capacities in a dynamic environment. At the same time, their network availability is unstable, and their data quality varies. To achieve high-quality federated learning and handle these challenges, designing the proper client selection process and methods are essential, which involves selecting suitable clients from the candidates. This study presents a comprehensive systematic literature review (SLR) that focuses on the challenges of client selection (CS) in the context of federated learning (FL). The objective of this SLR is to facilitate future research and development of CS methods in FL. Additionally, a detailed and in-depth overview of the CS process is provided, encompassing its abstract implementation and essential characteristics. This comprehensive presentation enables the application of CS in diverse domains. Furthermore, various CS methods are thoroughly categorized and explained based on their key characteristics and their ability to address specific challenges. This categorization offers valuable insights into the current state of the literature while also providing a roadmap for prospective investigations in this area of research.
Collapse
Affiliation(s)
- Naghmeh Khajehali
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia; (J.Y.); (Y.-W.C.)
| | - Jun Yan
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia; (J.Y.); (Y.-W.C.)
| | - Yang-Wai Chow
- School of Computing and Information Technology, University of Wollongong, Wollongong, NSW 2522, Australia; (J.Y.); (Y.-W.C.)
| | - Mahdi Fahmideh
- School of Business, University of Southern Queensland (USQ), Brisbane, QLD 4350, Australia;
| |
Collapse
|
27
|
Gu X, Sabrina F, Fan Z, Sohail S. A Review of Privacy Enhancement Methods for Federated Learning in Healthcare Systems. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:6539. [PMID: 37569079 PMCID: PMC10418741 DOI: 10.3390/ijerph20156539] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/11/2023] [Accepted: 08/04/2023] [Indexed: 08/13/2023]
Abstract
Federated learning (FL) provides a distributed machine learning system that enables participants to train using local data to create a shared model by eliminating the requirement of data sharing. In healthcare systems, FL allows Medical Internet of Things (MIoT) devices and electronic health records (EHRs) to be trained locally without sending patients data to the central server. This allows healthcare decisions and diagnoses based on datasets from all participants, as well as streamlining other healthcare processes. In terms of user data privacy, this technology allows collaborative training without the need of sharing the local data with the central server. However, there are privacy challenges in FL arising from the fact that the model updates are shared between the client and the server which can be used for re-generating the client's data, breaching privacy requirements of applications in domains like healthcare. In this paper, we have conducted a review of the literature to analyse the existing privacy and security enhancement methods proposed for FL in healthcare systems. It has been identified that the research in the domain focuses on seven techniques: Differential Privacy, Homomorphic Encryption, Blockchain, Hierarchical Approaches, Peer to Peer Sharing, Intelligence on the Edge Device, and Mixed, Hybrid and Miscellaneous Approaches. The strengths, limitations, and trade-offs of each technique were discussed, and the possible future for these seven privacy enhancement techniques for healthcare FL systems was identified.
Collapse
Affiliation(s)
- Xin Gu
- School of Information Technology, King’s Own Institute, Sydney, NSW 2000, Australia;
| | - Fariza Sabrina
- School of Engineering and Technology, Central Queensland University, Sydney, NSW 2000, Australia;
| | - Zongwen Fan
- College of Computer Science and Technology, Huaqiao University, Xiamen 361021, China
| | - Shaleeza Sohail
- College of Engineering, Science and Environment, The University of Newcastle, Callaghan, NSW 2308, Australia;
| |
Collapse
|
28
|
Yao Z, Wang H, Yan W, Wang Z, Zhang W, Wang Z, Zhang G. Artificial intelligence-based diagnosis of Alzheimer's disease with brain MRI images. Eur J Radiol 2023; 165:110934. [PMID: 37354773 DOI: 10.1016/j.ejrad.2023.110934] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 05/21/2023] [Accepted: 06/15/2023] [Indexed: 06/26/2023]
Abstract
Alzheimer's disease, a primary neurodegenerative condition, predominantly impacts the elderly and pre-elderly population. This progressive neurological disorder is characterized by an array of symptoms including memory loss, cognitive decline, and various physiological and psychological disturbances, significantly compromising the quality of life of patients and their caregivers. Recent advancements in Magnetic Resonance Imaging (MRI) technology have catalyzed research in AI-enhanced diagnostics for Alzheimer's disease, fostering optimism for early detection and timely interventions. This progress has paved the way for the development of sophisticated algorithms and models adept at analyzing complex brain imaging data, thereby augmenting diagnostic accuracy and efficiency. This advancement fuels optimism regarding the transformative potential of AI-driven diagnostics in revolutionizing Alzheimer's disease management, with the prospect of facilitating more effective treatment strategies and improved patient outcomes. The objective of this review is to provide a comprehensive overview of recent developments in deep learning methodologies applied to brain MRI images for the classification of various stages of Alzheimer's disease, with a particular emphasis on early diagnosis. Furthermore, this review underscores the limitations of current research, discussing potential challenges and future research directions in this dynamic field.
Collapse
Affiliation(s)
- Zhaomin Yao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Hongyu Wang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Wencheng Yan
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Zheling Wang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
| | - Wenwen Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
| | - Zhiguo Wang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China.
| | - Guoxu Zhang
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China; Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China.
| |
Collapse
|
29
|
Su Y, Huang C, Zhu W, Lyu X, Ji F. Multi-party Diabetes Mellitus risk prediction based on secure federated learning. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023]
|
30
|
Wang S, Zhu X. FedDNA: Federated learning using dynamic node alignment. PLoS One 2023; 18:e0288157. [PMID: 37399217 PMCID: PMC10317242 DOI: 10.1371/journal.pone.0288157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 06/20/2023] [Indexed: 07/05/2023] Open
Abstract
Federated Learning (FL), as a new computing framework, has received significant attentions recently due to its advantageous in preserving data privacy in training models with superb performance. During FL learning, distributed sites first learn respective parameters. A central site will consolidate learned parameters, using average or other approaches, and disseminate new weights across all sites to carryout next round of learning. The distributed parameter learning and consolidation repeat in an iterative fashion until the algorithm converges or terminates. Many FL methods exist to aggregate weights from distributed sites, but most approaches use a static node alignment approach, where nodes of distributed networks are statically assigned, in advance, to match nodes and aggregate their weights. In reality, neural networks, especially dense networks, have nontransparent roles with respect to individual nodes. Combined with random nature of the networks, static node matching often does not result in best matching between nodes across sites. In this paper, we propose, FedDNA, a dynamic node alignment federated learning algorithm. Our theme is to find best matching nodes between different sites, and then aggregate weights of matching nodes for federated learning. For each node in a neural network, we represent its weight values as a vector, and use a distance function to find most similar nodes, i.e., nodes with the smallest distance from other sides. Because finding best matching across all sites are computationally expensive, we further design a minimum spanning tree based approach to ensure that a node from each site will have matched peers from other sites, such that the total pairwise distances across all sites are minimized. Experiments and comparisons demonstrate that FedDNA outperforms commonly used baseline, such as FedAvg, for federated learning.
Collapse
Affiliation(s)
- Shuwen Wang
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, United States of America
| | - Xingquan Zhu
- Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, Florida, United States of America
| |
Collapse
|
31
|
Tarumi S, Suzuki M, Yoshida H, Miyauchi S, Kurazume R. Personalized Federated Learning for Institutional Prediction Model using Electronic Health Records: A Covariate Adjustment Approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083200 DOI: 10.1109/embc40787.2023.10339940] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
Federated learning (FL) has attracted attention as a technology that allows multiple medical institutions to collaborate on AI without disclosing each other's patient data. However, FL has the challenge of being unable to robustly learn when the data of participating clients is non-independently and non-identically distributed (Non-IID). Personalized Federated Learning (PFL), which constructs a personalized model for each client, has been proposed as a solution to this problem. However, conventional PFL methods do not ensure the interpretability of personalization, specifically, the identification of which data samples are contributed to each personalized learning, which is important for AI in medical applications. In this study, we propose a novel PFL framework, Federated Adjustment of Covariate (FedCov), which acquires a propensity score model representing the covariate shift among clients through prior FL, then learns a final model by weighting the contribution of each training sample to PFL based on the estimated propensity score. This approach enables both the learning of personalized models through covariate adjustment and the visualization of the contribution of each client to PFL. FedCov was evaluated in the prediction of in-hospital mortality across 50 hospitals in the eICU Collaborative Research Database, achieving an ROC-AUC of 0.750. This result outperformed the AUCs in the 0.720-0.735 range achieved by conventional FL methods and was closest to the AUC of 0.754 achieved by centralized learning.Clinical Relevance- This study demonstrates the feasibility of providing sophisticated and personalized AI-driven clinical decision support to any medical institution through personalized federated learning.
Collapse
|
32
|
Diniz JM, Vasconcelos H, Souza J, Rb-Silva R, Ameijeiras-Rodriguez C, Freitas A. Comparing Decentralized Learning Methods for Health Data Models to Nondecentralized Alternatives: Protocol for a Systematic Review. JMIR Res Protoc 2023; 12:e45823. [PMID: 37335606 PMCID: PMC10337426 DOI: 10.2196/45823] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 04/27/2023] [Accepted: 04/28/2023] [Indexed: 06/21/2023] Open
Abstract
BACKGROUND Considering the soaring health-related costs directed toward a growing, aging, and comorbid population, the health sector needs effective data-driven interventions while managing rising care costs. While health interventions using data mining have become more robust and adopted, they often demand high-quality big data. However, growing privacy concerns have hindered large-scale data sharing. In parallel, recently introduced legal instruments require complex implementations, especially when it comes to biomedical data. New privacy-preserving technologies, such as decentralized learning, make it possible to create health models without mobilizing data sets by using distributed computation principles. Several multinational partnerships, including a recent agreement between the United States and the European Union, are adopting these techniques for next-generation data science. While these approaches are promising, there is no clear and robust evidence synthesis of health care applications. OBJECTIVE The main aim is to compare the performance among health data models (eg, automated diagnosis and mortality prediction) developed using decentralized learning approaches (eg, federated and blockchain) to those using centralized or local methods. Secondary aims are comparing the privacy compromise and resource use among model architectures. METHODS We will conduct a systematic review using the first-ever registered research protocol for this topic following a robust search methodology, including several biomedical and computational databases. This work will compare health data models differing in development architecture, grouping them according to their clinical applications. For reporting purposes, a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 flow diagram will be presented. CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies)-based forms will be used for data extraction and to assess the risk of bias, alongside PROBAST (Prediction Model Risk of Bias Assessment Tool). All effect measures in the original studies will be reported. RESULTS The queries and data extractions are expected to start on February 28, 2023, and end by July 31, 2023. The research protocol was registered with PROSPERO, under the number 393126, on February 3, 2023. With this protocol, we detail how we will conduct the systematic review. With that study, we aim to summarize the progress and findings from state-of-the-art decentralized learning models in health care in comparison to their local and centralized counterparts. Results are expected to clarify the consensuses and heterogeneities reported and help guide the research and development of new robust and sustainable applications to address the health data privacy problem, with applicability in real-world settings. CONCLUSIONS We expect to clearly present the status quo of these privacy-preserving technologies in health care. With this robust synthesis of the currently available scientific evidence, the review will inform health technology assessment and evidence-based decisions, from health professionals, data scientists, and policy makers alike. Importantly, it should also guide the development and application of new tools in service of patients' privacy and future research. TRIAL REGISTRATION PROSPERO 393126; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=393126. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) PRR1-10.2196/45823.
Collapse
Affiliation(s)
- José Miguel Diniz
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
- PhD Program in Health Data Science, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Henrique Vasconcelos
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Júlio Souza
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Rita Rb-Silva
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Carolina Ameijeiras-Rodriguez
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| | - Alberto Freitas
- CINTESIS-Centre for Health Technology and Services Research, Faculty of Medicine, University of Porto, Porto, Portugal
- MEDCIDS-Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, Porto, Portugal
| |
Collapse
|
33
|
Salmeron JL, Arévalo I, Ruiz-Celma A. Benchmarking federated strategies in Peer-to-Peer Federated learning for biomedical data. Heliyon 2023; 9:e16925. [PMID: 37332922 PMCID: PMC10272318 DOI: 10.1016/j.heliyon.2023.e16925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 05/31/2023] [Accepted: 06/01/2023] [Indexed: 06/20/2023] Open
Abstract
The increasing requirements for data protection and privacy have attracted a huge research interest on distributed artificial intelligence and specifically on federated learning, an emerging machine learning approach that allows the construction of a model between several participants who hold their own private data. In the initial proposal of federated learning the architecture was centralised and the aggregation was done with federated averaging, meaning that a central server will orchestrate the federation using the most straightforward averaging strategy. This research is focused on testing different federated strategies in a peer-to-peer environment. The authors propose various aggregation strategies for federated learning, including weighted averaging aggregation, using different factors and strategies based on participant contribution. The strategies are tested with varying data sizes to identify the most robust ones. This research tests the strategies with several biomedical datasets and the results of the experiments show that the accuracy-based weighted average outperforms the classical federated averaging method.
Collapse
Affiliation(s)
- Jose L. Salmeron
- CUNEF Universidad, Madrid, Spain
- Universidad Autónoma de Chile, Chile
| | | | | |
Collapse
|
34
|
Aydin A, Gürsoy A, Karal H. Mobile care app development process: using the ADDIE model to manage symptoms after breast cancer surgery (step 1). Discov Oncol 2023; 14:63. [PMID: 37160467 PMCID: PMC10169965 DOI: 10.1007/s12672-023-00676-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 04/28/2023] [Indexed: 05/11/2023] Open
Abstract
The use of mobile applications is widespread in patient monitoring or education today. This study aims to describe the design and development process of a mobile app that supports patient self-care after breast cancer surgery. We used the ADDIE model to develop and test the mobile app. ADDIE (Analysis, Design, Development, Implementation, Evaluation) is a systematic approach based on a standard instructional design model for creating training materials. The model consists of five phases, each with its own set of steps. Once the steps within each phase are completed, the model progresses to the next phase, ultimately resulting in a "usable" product. Different team collaborations were established within each phase, and support was obtained from multiple experts during the design process. Thanks to this model, the information that patients need was transformed into a technological product. This article, which explains the stages of the product design process for mobile applications, provides information that may be helpful to researchers working on similar products.
Collapse
Affiliation(s)
- Aydanur Aydin
- Faculty of Health Sciences, Nursing Department, Gumushane University, Gumushane, Turkey.
| | - Ayla Gürsoy
- Faculty of Health Sciences, Nursing Department, Antalya Bilim University, Antalya, Turkey
| | - Hasan Karal
- Faculty of Education, Computer and Instructional Technologies Education, Trabzon University, Trabzon, Turkey
| |
Collapse
|
35
|
Cremonesi F, Planat V, Kalokyri V, Kondylakis H, Sanavia T, Miguel Mateos Resinas V, Singh B, Uribe S. The need for multimodal health data modeling: a practical approach for a federated-learning healthcare platform. J Biomed Inform 2023; 141:104338. [PMID: 37023843 DOI: 10.1016/j.jbi.2023.104338] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 03/06/2023] [Accepted: 03/11/2023] [Indexed: 04/08/2023]
Abstract
Federated learning initiatives in healthcare are being developed to collaboratively train predictive models without the need to centralize sensitive personal data. GenoMed4All is one such project, with the goal of connecting European clinical and -omics data repositories on rare diseases through a federated learning platform. Currently, the consortium faces the challenge of a lack of well-established international datasets and interoperability standards for federated learning applications on rare diseases. This paper presents our practical approach to select and implement a Common Data Model (CDM) suitable for the federated training of predictive models applied to the medical domain, during the initial design phase of our federated learning platform. We describe our selection process, composed of identifying the consortium's needs, reviewing our functional and technical architecture specifications, and extracting a list of business requirements. We review the state of the art and evaluate three widely-used approaches (FHIR, OMOP and Phenopackets) based on a checklist of requirements and specifications. We discuss the pros and cons of each approach considering the use cases specific to our consortium as well as the generic issues of implementing a European federated learning healthcare platform. A list of lessons learned from the experience in our consortium is discussed, from the importance of establishing the proper communication channels for all stakeholders to technical aspects related to -omics data. For federated learning projects focused on secondary use of health data for predictive modeling, encompassing multiple data modalities, a phase of data model convergence is sorely needed to gather different data representations developed in the context of medical research, interoperability of clinical care software, imaging, and -omics analysis into a coherent, unified data model. Our work identifies this need and presents our experience and a list of actionable lessons learned for future work in this direction.
Collapse
Affiliation(s)
- Francesco Cremonesi
- Université Côte d'Azur, Inria Sophia Antipolis-Méditeranée, Epione Research Project, France AND Datawizard S.r.l, Rome, Italy.
| | | | - Varvara Kalokyri
- Institute of Computer Science, Foundation for Research and Technology - Hellas, Crete, Greece
| | - Haridimos Kondylakis
- Institute of Computer Science, Foundation for Research and Technology - Hellas, Crete, Greece
| | - Tiziana Sanavia
- Department of Medical Sciences, University of Torino, Torino, Italy
| | | | - Babita Singh
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Silvia Uribe
- Escuela Técnica Superior de Ingeniería de Sistemas Informáticos, Universidad Politécnica de Madrid, Madrid, Spain
| |
Collapse
|
36
|
Abdullahi IY, Raab R, Küderle A, Eskofier B. Aligning Federated Learning with Existing Trust Structures in Health Care Systems. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:5378. [PMID: 37047992 PMCID: PMC10094512 DOI: 10.3390/ijerph20075378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 03/17/2023] [Accepted: 03/29/2023] [Indexed: 06/19/2023]
Abstract
Patient-centered health care information systems (PHSs) on peer-to-peer (P2P) networks (e.g., decentralized personal health records) enable storing data locally at the edge to enhance data sovereignty and resilience to single points of failure. Nonetheless, these systems raise concerns on trust and adoption in medical workflow due to non-alignment to current health care processes and stakeholders' needs. The distributed nature of the data makes it more challenging to train and deploy machine learning models (using traditional methods) at the edge, for instance, for disease prediction. Federated learning (FL) has been proposed as a possible solution to these limitations. However, the P2P PHS architecture challenges current FL solutions because they use centralized engines (or random entities that could pose privacy concerns) for model update aggregation. Consequently, we propose a novel conceptual FL framework, CareNetFL, that is suitable for P2P PHS multi-tier and hybrid architecture and leverages existing trust structures in health care systems to ensure scalability, trust, and security. Entrusted parties (practitioners' nodes) are used in CareNetFL to aggregate local model updates in the network hierarchy for their patients instead of random entities that could actively become malicious. Involving practitioners in their patients' FL model training increases trust and eases access to medical data. The proposed concepts mitigate communication latency and improve FL performance through patient-practitioner clustering, reducing skewed and imbalanced data distributions and system heterogeneity challenges of FL at the edge. The framework also ensures end-to-end security and accountability through leveraging identity-based systems and privacy-preserving techniques that only guarantee security during training.
Collapse
|
37
|
Rajagopal A, Redekop E, Kemisetti A, Kulkarni R, Raman S, Sarma K, Magudia K, Arnold CW, Larson PEZ. Federated Learning with Research Prototypes: Application to Multi-Center MRI-based Detection of Prostate Cancer with Diverse Histopathology. Acad Radiol 2023; 30:644-657. [PMID: 36914501 PMCID: PMC10869141 DOI: 10.1016/j.acra.2023.02.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 03/13/2023]
Abstract
RATIONALE AND OBJECTIVES Early prostate cancer detection and staging from MRI is extremely challenging for both radiologists and deep learning algorithms, but the potential to learn from large and diverse datasets remains a promising avenue to increase their performance within and across institutions. To enable this for prototype-stage algorithms, where the majority of existing research remains, we introduce a flexible federated learning framework for cross-site training, validation, and evaluation of custom deep learning prostate cancer detection algorithms. MATERIALS AND METHODS We introduce an abstraction of prostate cancer groundtruth that represents diverse annotation and histopathology data. We maximize use of this groundtruth if and when they are available using UCNet, a custom 3D UNet that enables simultaneous supervision of pixel-wise, region-wise, and gland-wise classification. We leverage these modules to perform cross-site federated training using 1400+ heterogeneous multi-parameteric prostate MRI exams from two University hospitals. RESULTS We observe a positive result, with significant improvements in cross-site generalization performance with negligible intra-site performance degradation for both lesion segmentation and per-lesion binary classification of clinically-significant prostate cancer. Cross-site lesion segmentation performance intersection-over-union (IoU) improved by 100%, while cross-site lesion classification performance overall accuracy improved by 9.5-14.8%, depending on the optimal checkpoint selected by each site. CONCLUSION Federated learning can improve the generalization performance of prostate cancer detection models across institutions while protecting patient health information and institution-specific code and data. However, even more data and participating institutions are likely required to improve the absolute performance of prostate cancer classification models. To enable adoption of federated learning with limited re-engineering of federated components, we open-source our FLtools system at https://federated.ucsf.edu, including examples that can be easily adapted to other medical imaging deep learning projects.
Collapse
Affiliation(s)
- Abhejit Rajagopal
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, 94158, USA.
| | - Ekaterina Redekop
- Departments of Radiology and Electrical Engineering, University of California, Los Angeles, 90024, USA
| | - Anil Kemisetti
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, 94158, USA
| | - Rushikesh Kulkarni
- Departments of Radiology and Electrical Engineering, University of California, Los Angeles, 90024, USA
| | - Steven Raman
- Departments of Radiology and Electrical Engineering, University of California, Los Angeles, 90024, USA
| | - Karthik Sarma
- Departments of Radiology and Electrical Engineering, University of California, Los Angeles, 90024, USA
| | - Kirti Magudia
- Department of Radiology, Duke University, Durham, 27708, USA
| | - Corey W Arnold
- Departments of Radiology and Electrical Engineering, University of California, Los Angeles, 90024, USA
| | - Peder E Z Larson
- Department of Radiology and Biomedical Imaging, University of California, San Francisco, 94158, USA
| |
Collapse
|
38
|
Hughes JH, Woo KH, Keizer RJ, Goswami S. Clinical Decision Support for Precision Dosing: Opportunities for Enhanced Equity and Inclusion in Health Care. Clin Pharmacol Ther 2023; 113:565-574. [PMID: 36408716 DOI: 10.1002/cpt.2799] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 11/13/2022] [Indexed: 11/22/2022]
Abstract
Precision dosing aims to tailor doses to individual patients with the goal of improving treatment efficacy and avoiding toxicity. Clinical decision support software (CDSS) plays a crucial role in mediating this process, translating knowledge derived from clinical trials and real-world data (RWD) into actionable insights for clinicians to use at the point of care. However, not all patient populations are proportionally represented in clinical trials and other data sources that inform CDSS tools, limiting the applicability of these tools for underrepresented populations. Here, we review some of the limitations of existing CDSS tools and discuss methods for overcoming these gaps. We discuss considerations for study design and modeling to create more inclusive CDSS, particularly with an eye toward better incorporation of biological indicators in place of race, ethnicity, or sex. We also review inclusive practices for collection of these demographic data, during both study design and in software user interface design. Because of the role CDSS plays in both recording routine clinical care data and disseminating knowledge derived from data, CDSS presents a promising opportunity to continuously improve precision dosing algorithms using RWD to better reflect the diversity of patient populations.
Collapse
Affiliation(s)
| | - Kara H Woo
- InsightRX, San Francisco, California, USA
| | | | | |
Collapse
|
39
|
Dasaradharami Reddy K, Gadekallu TR. A Comprehensive Survey on Federated Learning Techniques for Healthcare Informatics. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:8393990. [PMID: 36909974 PMCID: PMC9995203 DOI: 10.1155/2023/8393990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Revised: 04/18/2022] [Accepted: 05/18/2022] [Indexed: 03/06/2023]
Abstract
Healthcare is predominantly regarded as a crucial consideration in promoting the general physical and mental health and well-being of people around the world. The amount of data generated by healthcare systems is enormous, making it challenging to manage. Many machine learning (ML) approaches were implemented to develop dependable and robust solutions to handle the data. ML cannot fully utilize data due to privacy concerns. This primarily happens in the case of medical data. Due to a lack of precise clinical data, the application of ML for the same is challenging and may not yield desired results. Federated learning (FL), which is a recent development in ML where the computation is offloaded to the source of data, appears to be a promising solution to this problem. In this study, we present a detailed survey of applications of FL for healthcare informatics. We initiate a discussion on the need for FL in the healthcare domain, followed by a review of recent review papers. We focus on the fundamentals of FL and the major motivations behind FL for healthcare applications. We then present the applications of FL along with recent state of the art in several verticals of healthcare. Then, lessons learned, open issues, and challenges that are yet to be solved are also highlighted. This is followed by future directions that give directions to the prospective researchers willing to do their research in this domain.
Collapse
Affiliation(s)
- K. Dasaradharami Reddy
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| | - Thippa Reddy Gadekallu
- School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
| |
Collapse
|
40
|
Moshawrab M, Adda M, Bouzouane A, Ibrahim H, Raad A. Reviewing Federated Machine Learning and Its Use in Diseases Prediction. SENSORS (BASEL, SWITZERLAND) 2023; 23:s23042112. [PMID: 36850717 PMCID: PMC9958993 DOI: 10.3390/s23042112] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 02/04/2023] [Accepted: 02/09/2023] [Indexed: 05/31/2023]
Abstract
Machine learning (ML) has succeeded in improving our daily routines by enabling automation and improved decision making in a variety of industries such as healthcare, finance, and transportation, resulting in increased efficiency and production. However, the development and widespread use of this technology has been significantly hampered by concerns about data privacy, confidentiality, and sensitivity, particularly in healthcare and finance. The "data hunger" of ML describes how additional data can increase performance and accuracy, which is why this question arises. Federated learning (FL) has emerged as a technology that helps solve the privacy problem by eliminating the need to send data to a primary server and collect it where it is processed and the model is trained. To maintain privacy and improve model performance, FL shares parameters rather than data during training, in contrast to the typical ML practice of sending user data during model development. Although FL is still in its infancy, there are already applications in various industries such as healthcare, finance, transportation, and others. In addition, 32% of companies have implemented or plan to implement federated learning in the next 12-24 months, according to the latest figures from KPMG, which forecasts an increase in investment in this area from USD 107 million in 2020 to USD 538 million in 2025. In this context, this article reviews federated learning, describes it technically, differentiates it from other technologies, and discusses current FL aggregation algorithms. It also discusses the use of FL in the diagnosis of cardiovascular disease, diabetes, and cancer. Finally, the problems hindering progress in this area and future strategies to overcome these limitations are discussed in detail.
Collapse
Affiliation(s)
- Mohammad Moshawrab
- Département de Mathématiques, Informatique et Génie, Université du Québec à Rimouski, 300 Allée des Ursulines, Rimouski, QC G5L 3A1, Canada
| | - Mehdi Adda
- Département de Mathématiques, Informatique et Génie, Université du Québec à Rimouski, 300 Allée des Ursulines, Rimouski, QC G5L 3A1, Canada
| | - Abdenour Bouzouane
- Département d’Informatique et de Mathématique, Université du Québec à Chicoutimi, 555 Boulevard de l’Université, Chicoutimi, QC G7H 2B1, Canada
| | - Hussein Ibrahim
- Institut Technologique de Maintenance Industrielle, 175 Rue de la Vérendrye, Sept-Îles, QC G4R 5B7, Canada
| | - Ali Raad
- Faculty of Arts & Sciences, Islamic University of Lebanon, Wardaniyeh P.O. Box 30014, Lebanon
| |
Collapse
|
41
|
Federated machine learning in data-protection-compliant research. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00601-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
42
|
Belhadi A, Holland JO, Yazidi A, Srivastava G, Lin JCW, Djenouri Y. BIoMT-ISeg: Blockchain internet of medical things for intelligent segmentation. Front Physiol 2023; 13:1097204. [PMID: 36714314 PMCID: PMC9879662 DOI: 10.3389/fphys.2022.1097204] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Accepted: 12/20/2022] [Indexed: 01/13/2023] Open
Abstract
In the quest of training complicated medical data for Internet of Medical Things (IoMT) scenarios, this study develops an end-to-end intelligent framework that incorporates ensemble learning, genetic algorithms, blockchain technology, and various U-Net based architectures. Genetic algorithms are used to optimize the hyper-parameters of the used architectures. The training process was also protected with the help of blockchain technology. Finally, an ensemble learning system based on voting mechanism was developed to combine local outputs of various segmentation models into a global output. Our method shows that strong performance in a condensed number of epochs may be achieved with a high learning rate and a small batch size. As a result, we are able to perform better than standard solutions for well-known medical databases. In fact, the proposed solution reaches 95% of intersection over the union, compared to the baseline solutions where they are below 80%. Moreover, with the proposed blockchain strategy, the detected attacks reached 76%.
Collapse
Affiliation(s)
- Asma Belhadi
- School of Economics, Innovation and Technology, Kristiania University College, Oslo, Norway
| | | | - Anis Yazidi
- Department of Computer Science, OsloMet, Oslo, Norway
| | - Gautam Srivastava
- Brandon University, Brandon, MB, Canada,China Medical University, Taichung, Taiwan,Lebanese American University, Beirut, Lebanon
| | - Jerry Chun-Wei Lin
- Westsern Norway University of Applied Sciences, Bergen, Norway,*Correspondence: Jerry Chun-Wei Lin ,
| | | |
Collapse
|
43
|
Díaz JSP, García ÁL. Study of the performance and scalability of federated learning for medical imaging with intermittent clients. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
44
|
Design and Implementation of a Comprehensive AI Dashboard for Real-Time Prediction of Adverse Prognosis of ED Patients. Healthcare (Basel) 2022; 10:healthcare10081498. [PMID: 36011155 PMCID: PMC9408009 DOI: 10.3390/healthcare10081498] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Revised: 08/02/2022] [Accepted: 08/03/2022] [Indexed: 11/16/2022] Open
Abstract
The emergency department (ED) is at the forefront of medical care, and the medical team needs to make outright judgments and treatment decisions under time constraints. Thus, knowing how to make personalized and precise predictions is a very challenging task. With the advancement of artificial intelligence (AI) technology, Chi Mei Medical Center (CMMC) adopted AI, the Internet of Things (IoT), and interaction technologies to establish diverse prognosis prediction models for eight diseases based on the ED electronic medical records of three branch hospitals. CMMC integrated these predictive models to form a digital AI dashboard, showing the risk status of all ED patients diagnosed with any of these eight diseases. This study first explored the methodology of CMMC’s AI development and proposed a four-tier AI dashboard architecture for ED implementation. The AI dashboard’s ease of use, usefulness, and acceptance was also strongly affirmed by the ED medical staff. The ED AI dashboard is an effective tool in the implementation of real-time risk monitoring of patients in the ED and could improve the quality of care as a part of best practice. Based on the results of this study, it is suggested that healthcare institutions thoughtfully consider tailoring their ED dashboard designs to adapt to their unique workflows and environments.
Collapse
|
45
|
Comparative Review of the Intrusion Detection Systems Based on Federated Learning: Advantages and Open Challenges. ALGORITHMS 2022. [DOI: 10.3390/a15070247] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
In order to provide an accurate and timely response to different types of the attacks, intrusion and anomaly detection systems collect and analyze a lot of data that may include personal and other sensitive data. These systems could be considered a source of privacy-aware risks. Application of the federated learning paradigm for training attack and anomaly detection models may significantly decrease such risks as the data generated locally are not transferred to any party, and training is performed mainly locally on data sources. Another benefit of the usage of federated learning for intrusion detection is its ability to support collaboration between entities that could not share their dataset for confidential or other reasons. While this approach is able to overcome the aforementioned challenges it is rather new and not well-researched. The challenges and research questions appear while using it to implement analytical systems. In this paper, the authors review existing solutions for intrusion and anomaly detection based on the federated learning, and study their advantages as well as open challenges still facing them. The paper analyzes the architecture of the proposed intrusion detection systems and the approaches used to model data partition across the clients. The paper ends with discussion and formulation of the open challenges.
Collapse
|
46
|
BFV-Based Homomorphic Encryption for Privacy-Preserving CNN Models. CRYPTOGRAPHY 2022. [DOI: 10.3390/cryptography6030034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Medical data is frequently quite sensitive in terms of data privacy and security. Federated learning has been used to increase the privacy and security of medical data, which is a sort of machine learning technique. The training data is disseminated across numerous machines in federated learning, and the learning process is collaborative. There are numerous privacy attacks on deep learning (DL) models that attackers can use to obtain sensitive information. As a result, the DL model should be safeguarded from adversarial attacks, particularly in medical data applications. Homomorphic encryption-based model security from the adversarial collaborator is one of the answers to this challenge. Using homomorphic encryption, this research presents a privacy-preserving federated learning system for medical data. The proposed technique employs a secure multi-party computation protocol to safeguard the deep learning model from adversaries. The proposed approach is tested in terms of model performance using a real-world medical dataset in this paper.
Collapse
|
47
|
Smart Services in Smart Cities: Insights from Science Mapping Analysis. SUSTAINABILITY 2022. [DOI: 10.3390/su14116506] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Against the backdrop of the expanding debate on smart cities, the objective of this paper is to examine to what extent and to what end the connection between smart services and smart cities has been explored in the literature, and what to make of it. It is argued that smart services, including demand- and innovation-driven service development, constitute an essential part of the broad concept of smart city. Viewed in this way, smart services serve as one of the key levers through which smart cities grow, develop, and build their resilience. By placing the analysis in the broader context of the smart city as smart service system, this paper sheds light on the still underexplored fields of research and suggests how they could be examined. For the purpose of the analysis, the Science Mapping (SciMat) method is employed as it allows to quantify and to visualize research output featured in Scopus and Web of Science (WoS), thus aiding the analysis. The added value of this paper is two-fold, i.e., (i) the SciMat analysis identifies the key dimensions of the nascent smart services in smart cities debate, and consequently, (ii) allows for suggesting topics that should be further investigated to detect the drivers for cities’ growth, resilience, and sustainability.
Collapse
|
48
|
A Review on Federated Learning and Machine Learning Approaches: Categorization, Application Areas, and Blockchain Technology. INFORMATION 2022. [DOI: 10.3390/info13050263] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Federated learning (FL) is a scheme in which several consumers work collectively to unravel machine learning (ML) problems, with a dominant collector synchronizing the procedure. This decision correspondingly enables the training data to be distributed, guaranteeing that the individual device’s data are secluded. The paper systematically reviewed the available literature using the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) guiding principle. The study presents a systematic review of appliable ML approaches for FL, reviews the categorization of FL, discusses the FL application areas, presents the relationship between FL and Blockchain Technology (BT), and discusses some existing literature that has used FL and ML approaches. The study also examined applicable machine learning models for federated learning. The inclusion measures were (i) published between 2017 and 2021, (ii) written in English, (iii) published in a peer-reviewed scientific journal, and (iv) Preprint published papers. Unpublished studies, thesis and dissertation studies, (ii) conference papers, (iii) not in English, and (iv) did not use artificial intelligence models and blockchain technology were all removed from the review. In total, 84 eligible papers were finally examined in this study. Finally, in recent years, the amount of research on ML using FL has increased. Accuracy equivalent to standard feature-based techniques has been attained, and ensembles of many algorithms may yield even better results. We discovered that the best results were obtained from the hybrid design of an ML ensemble employing expert features. However, some additional difficulties and issues need to be overcome, such as efficiency, complexity, and smaller datasets. In addition, novel FL applications should be investigated from the standpoint of the datasets and methodologies.
Collapse
|