1
|
Tajabadi M, Martin R, Heider D. Privacy-preserving decentralized learning methods for biomedical applications. Comput Struct Biotechnol J 2024; 23:3281-3287. [PMID: 39296807 PMCID: PMC11408144 DOI: 10.1016/j.csbj.2024.08.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 08/26/2024] [Accepted: 08/26/2024] [Indexed: 09/21/2024] Open
Abstract
In recent years, decentralized machine learning has emerged as a significant advancement in biomedical applications, offering robust solutions for data privacy, security, and collaboration across diverse healthcare environments. In this review, we examine various decentralized learning methodologies, including federated learning, split learning, swarm learning, gossip learning, edge learning, and some of their applications in the biomedical field. We delve into the underlying principles, network topologies, and communication strategies of each approach, highlighting their advantages and limitations. Ultimately, the selection of a suitable method should be based on specific needs, infrastructures, and computational capabilities.
Collapse
Affiliation(s)
- Mohammad Tajabadi
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| | - Roman Martin
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| | - Dominik Heider
- Institute of Computer Science, Heinrich-Heine-University Duesseldorf, Graf-Adolf-Str. 63, Duesseldorf, 40215, North Rhine-Westphalia, Germany
- Center for Digital Medicine, Heinrich-Heine-University Duesseldorf, Moorenstr. 5, Duesseldorf, 40215, North Rhine-Westphalia, Germany
| |
Collapse
|
2
|
Zhang C, Zhou Z, Peng L. Letter to the Editor: "Prediction models for differentiating benign from malignant liver lesions based on multiparametric dual-energy non-contrast CT". Eur Radiol 2024:10.1007/s00330-024-11181-w. [PMID: 39514112 DOI: 10.1007/s00330-024-11181-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 08/28/2024] [Accepted: 10/11/2024] [Indexed: 11/16/2024]
Affiliation(s)
- Chenwen Zhang
- Department of Imaging, The Second People's Hospital of Jingdezhen, Jingdezhen, China
| | - Zhanmei Zhou
- Department of Imaging, The Second People's Hospital of Jingdezhen, Jingdezhen, China
| | - Liang Peng
- Department of Imaging, The Second People's Hospital of Jingdezhen, Jingdezhen, China.
| |
Collapse
|
3
|
Probul N, Huang Z, Saak CC, Baumbach J, List M. AI in microbiome-related healthcare. Microb Biotechnol 2024; 17:e70027. [PMID: 39487766 PMCID: PMC11530995 DOI: 10.1111/1751-7915.70027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Accepted: 09/23/2024] [Indexed: 11/04/2024] Open
Abstract
Artificial intelligence (AI) has the potential to transform clinical practice and healthcare. Following impressive advancements in fields such as computer vision and medical imaging, AI is poised to drive changes in microbiome-based healthcare while facing challenges specific to the field. This review describes the state-of-the-art use of AI in microbiome-related healthcare. It points out limitations across topics such as data handling, AI modelling and safeguarding patient privacy. Furthermore, we indicate how these current shortcomings could be overcome in the future and discuss the influence and opportunities of increasingly complex data on microbiome-based healthcare.
Collapse
Affiliation(s)
- Niklas Probul
- Institute for Computational Systems BiologyUniversity of HamburgHamburgGermany
| | - Zihua Huang
- Data Science in Systems Biology, TUM School of Life SciencesTechnical University of MunichFreisingGermany
| | | | - Jan Baumbach
- Institute for Computational Systems BiologyUniversity of HamburgHamburgGermany
- Computational Biomedicine Lab, Department of Mathematics and Computer ScienceUniversity of Southern DenmarkOdenseDenmark
| | - Markus List
- Data Science in Systems Biology, TUM School of Life SciencesTechnical University of MunichFreisingGermany
- Munich Data Science InstituteTechnical University of MunichGarchingGermany
| |
Collapse
|
4
|
Hausleitner C, Mueller H, Holzinger A, Pfeifer B. Collaborative weighting in federated graph neural networks for disease classification with the human-in-the-loop. Sci Rep 2024; 14:21839. [PMID: 39294334 PMCID: PMC11410954 DOI: 10.1038/s41598-024-72748-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Accepted: 09/10/2024] [Indexed: 09/20/2024] Open
Abstract
The authors introduce a novel framework that integrates federated learning with Graph Neural Networks (GNNs) to classify diseases, incorporating Human-in-the-Loop methodologies. This advanced framework innovatively employs collaborative voting mechanisms on subgraphs within a Protein-Protein Interaction (PPI) network, situated in a federated ensemble-based deep learning context. This methodological approach marks a significant stride in the development of explainable and privacy-aware Artificial Intelligence, significantly contributing to the progression of personalized digital medicine in a responsible and transparent manner.
Collapse
Affiliation(s)
- Christian Hausleitner
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| | - Heimo Mueller
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| | - Andreas Holzinger
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria.
- Human-Centered AI Lab, Institute of Forest Engineering, Department of Forest and Soil Sciences, University of Natural Resources and Life Sciences Vienna, 1190, Vienna, Austria.
- Alberta Machine Intelligence Institute, Edmonton, T6G 2R3, Canada.
| | - Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, 8036, Graz, Austria
| |
Collapse
|
5
|
Yuan W, Li Y, Han Z, Chen Y, Xie J, Chen J, Bi Z, Xi J. Evolutionary Mechanism Based Conserved Gene Expression Biclustering Module Analysis for Breast Cancer Genomics. Biomedicines 2024; 12:2086. [PMID: 39335599 PMCID: PMC11428256 DOI: 10.3390/biomedicines12092086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Revised: 08/23/2024] [Accepted: 09/02/2024] [Indexed: 09/30/2024] Open
Abstract
The identification of significant gene biclusters with particular expression patterns and the elucidation of functionally related genes within gene expression data has become a critical concern due to the vast amount of gene expression data generated by RNA sequencing technology. In this paper, a Conserved Gene Expression Module based on Genetic Algorithm (CGEMGA) is proposed. Breast cancer data from the TCGA database is used as the subject of this study. The p-values from Fisher's exact test are used as evaluation metrics to demonstrate the significance of different algorithms, including the Cheng and Church algorithm, CGEM algorithm, etc. In addition, the F-test is used to investigate the difference between our method and the CGEM algorithm. The computational cost of the different algorithms is further investigated by calculating the running time of each algorithm. Finally, the established driver genes and cancer-related pathways are used to validate the process. The results of 10 independent runs demonstrate that CGEMGA has a superior average p-value of 1.54 × 10-4 ± 3.06 × 10-5 compared to all other algorithms. Furthermore, our approach exhibits consistent performance across all methods. The F-test yields a p-value of 0.039, indicating a significant difference between our approach and the CGEM. Computational cost statistics also demonstrate that our approach has a significantly shorter average runtime of 5.22 × 100 ± 1.65 × 10-1 s compared to the other algorithms. Enrichment analysis indicates that the genes in our approach are significantly enriched for driver genes. Our algorithm is fast and robust, efficiently extracting co-expressed genes and associated co-expression condition biclusters from RNA-seq data.
Collapse
Affiliation(s)
- Wei Yuan
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Yaming Li
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Zhengpan Han
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Yu Chen
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Jinnan Xie
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Jianguo Chen
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Zhisheng Bi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| | - Jianing Xi
- School of Biomedical Engineering, Guangzhou Medical University, Guangzhou 511436, China
| |
Collapse
|
6
|
Pfeifer B, Sirocchi C, Bloice MD, Kreuzthaler M, Urschler M. Federated unsupervised random forest for privacy-preserving patient stratification. Bioinformatics 2024; 40:ii198-ii207. [PMID: 39230698 PMCID: PMC11373406 DOI: 10.1093/bioinformatics/btae382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/05/2024] Open
Abstract
MOTIVATION In the realm of precision medicine, effective patient stratification and disease subtyping demand innovative methodologies tailored for multi-omics data. Clustering techniques applied to multi-omics data have become instrumental in identifying distinct subgroups of patients, enabling a finer-grained understanding of disease variability. Meanwhile, clinical datasets are often small and must be aggregated from multiple hospitals. Online data sharing, however, is seen as a significant challenge due to privacy concerns, potentially impeding big data's role in medical advancements using machine learning. This work establishes a powerful framework for advancing precision medicine through unsupervised random forest-based clustering in combination with federated computing. RESULTS We introduce a novel multi-omics clustering approach utilizing unsupervised random forests. The unsupervised nature of the random forest enables the determination of cluster-specific feature importance, unraveling key molecular contributors to distinct patient groups. Our methodology is designed for federated execution, a crucial aspect in the medical domain where privacy concerns are paramount. We have validated our approach on machine learning benchmark datasets as well as on cancer data from The Cancer Genome Atlas. Our method is competitive with the state-of-the-art in terms of disease subtyping, but at the same time substantially improves the cluster interpretability. Experiments indicate that local clustering performance can be improved through federated computing. AVAILABILITY AND IMPLEMENTATION The proposed methods are available as an R-package (https://github.com/pievos101/uRF).
Collapse
Affiliation(s)
- Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| | - Christel Sirocchi
- Department of Pure and Applied Sciences, University of Urbino, Urbino, 61029, Italy
| | - Marcus D Bloice
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| | - Markus Kreuzthaler
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| | - Martin Urschler
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz, 8010, Austria
| |
Collapse
|
7
|
Bernardini LG, Rosinger C, Bodner G, Keiblinger KM, Izquierdo-Verdiguier E, Spiegel H, Retzlaff CO, Holzinger A. Learning vs. understanding: When does artificial intelligence outperform process-based modeling in soil organic carbon prediction? N Biotechnol 2024; 81:20-31. [PMID: 38462171 DOI: 10.1016/j.nbt.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 02/24/2024] [Accepted: 03/06/2024] [Indexed: 03/12/2024]
Abstract
In recent years, machine learning (ML) algorithms have gained substantial recognition for ecological modeling across various temporal and spatial scales. However, little evaluation has been conducted for the prediction of soil organic carbon (SOC) on small data sets commonly inherent to long-term soil ecological research. In this context, the performance of ML algorithms for SOC prediction has never been tested against traditional process-based modeling approaches. Here, we compare ML algorithms, calibrated and uncalibrated process-based models as well as multiple ensembles on their performance in predicting SOC using data from five long-term experimental sites (comprising 256 independent data points) in Austria. Using all available data, the ML-based approaches using Random forest and Support vector machines with a polynomial kernel were superior to all process-based models. However, the ML algorithms performed similar or worse when the number of training samples was reduced or when a leave-one-site-out cross validation was applied. This emphasizes that the performance of ML algorithms is strongly dependent on the data-size related quality of learning information following the well-known curse of dimensionality phenomenon, while the accuracy of process-based models significantly relies on proper calibration and combination of different modeling approaches. Our study thus suggests a superiority of ML-based SOC prediction at scales where larger datasets are available, while process-based models are superior tools when targeting the exploration of underlying biophysical and biochemical mechanisms of SOC dynamics in soils. Therefore, we recommend applying ensembles of ML algorithms with process-based models to combine advantages inherent to both approaches.
Collapse
Affiliation(s)
| | - Christoph Rosinger
- Institute of Agronomy, University of Natural Resources and Life Sciences (BOKU) Vienna, Konrad Lorenz-Straße 24, 3430 Tulln an der Donau, Austria; Institute of Soil Research, University of Natural Resources and Life Sciences (BOKU) Vienna, Peter Jordan-Straße 82, 1190 Vienna, Austria.
| | - Gernot Bodner
- Institute of Agronomy, University of Natural Resources and Life Sciences (BOKU) Vienna, Konrad Lorenz-Straße 24, 3430 Tulln an der Donau, Austria
| | - Katharina M Keiblinger
- Institute of Soil Research, University of Natural Resources and Life Sciences (BOKU) Vienna, Peter Jordan-Straße 82, 1190 Vienna, Austria
| | - Emma Izquierdo-Verdiguier
- Institute of Geomatics, University of Natural Resources and Life Sciences (BOKU) Vienna, Peter Jordan-Straße 82, 1190 Vienna, Austria
| | - Heide Spiegel
- Austrian Agency for Health and Food Safety (AGES), Institute for Soil Health and Plant Nutrition, Spargelfeldstraße 191, 1226 Vienna, Austria
| | - Carl O Retzlaff
- Human-Centered AI Lab, Institute of Forest Engineering, University of Natural Resources and Life Sciences (BOKU) Vienna, Peter Jordan-Straße 82, 1190 Vienna, Austria
| | - Andreas Holzinger
- Human-Centered AI Lab, Institute of Forest Engineering, University of Natural Resources and Life Sciences (BOKU) Vienna, Peter Jordan-Straße 82, 1190 Vienna, Austria
| |
Collapse
|
8
|
Späth J, Sewald Z, Probul N, Berland M, Almeida M, Pons N, Le Chatelier E, Ginès P, Solé C, Juanola A, Pauling J, Baumbach J. Privacy-Preserving Federated Survival Support Vector Machines for Cross-Institutional Time-To-Event Analysis: Algorithm Development and Validation. JMIR AI 2024; 3:e47652. [PMID: 38875678 PMCID: PMC11041494 DOI: 10.2196/47652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 08/06/2023] [Accepted: 02/10/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Central collection of distributed medical patient data is problematic due to strict privacy regulations. Especially in clinical environments, such as clinical time-to-event studies, large sample sizes are critical but usually not available at a single institution. It has been shown recently that federated learning, combined with privacy-enhancing technologies, is an excellent and privacy-preserving alternative to data sharing. OBJECTIVE This study aims to develop and validate a privacy-preserving, federated survival support vector machine (SVM) and make it accessible for researchers to perform cross-institutional time-to-event analyses. METHODS We extended the survival SVM algorithm to be applicable in federated environments. We further implemented it as a FeatureCloud app, enabling it to run in the federated infrastructure provided by the FeatureCloud platform. Finally, we evaluated our algorithm on 3 benchmark data sets, a large sample size synthetic data set, and a real-world microbiome data set and compared the results to the corresponding central method. RESULTS Our federated survival SVM produces highly similar results to the centralized model on all data sets. The maximal difference between the model weights of the central model and the federated model was only 0.001, and the mean difference over all data sets was 0.0002. We further show that by including more data in the analysis through federated learning, predictions are more accurate even in the presence of site-dependent batch effects. CONCLUSIONS The federated survival SVM extends the palette of federated time-to-event analysis methods by a robust machine learning approach. To our knowledge, the implemented FeatureCloud app is the first publicly available implementation of a federated survival SVM, is freely accessible for all kinds of researchers, and can be directly used within the FeatureCloud platform.
Collapse
Affiliation(s)
- Julian Späth
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Zeno Sewald
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Niklas Probul
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Magali Berland
- MetaGenoPolis, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Mathieu Almeida
- MetaGenoPolis, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Nicolas Pons
- MetaGenoPolis, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | | | - Pere Ginès
- Liver Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centro de Investigacion en Red de Enfermedades hepaticas y Digestivas (CIBEReHD), Madrid, Spain
- Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain
| | - Cristina Solé
- Liver Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centro de Investigacion en Red de Enfermedades hepaticas y Digestivas (CIBEReHD), Madrid, Spain
| | - Adrià Juanola
- Liver Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centro de Investigacion en Red de Enfermedades hepaticas y Digestivas (CIBEReHD), Madrid, Spain
| | - Josch Pauling
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| |
Collapse
|
9
|
Pezoulas VC, Kalatzis F, Exarchos TP, Goules A, Tzioufas AG, Fotiadis DI. FHBF: Federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets. PATTERNS (NEW YORK, N.Y.) 2024; 5:100893. [PMID: 38264722 PMCID: PMC10801222 DOI: 10.1016/j.patter.2023.100893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 11/03/2023] [Accepted: 11/10/2023] [Indexed: 01/25/2024]
Abstract
Although several studies have deployed gradient boosting trees (GBT) as a robust classifier for federated learning tasks (federated GBT [FGBT]), even with dropout rates (federated gradient boosting trees with dropout rate [FDART]), none of them have investigated the overfitting effects of FGBT across heterogeneous and highly imbalanced datasets within federated environments nor the effect of dropouts in the loss function. In this work, we present the federated hybrid boosted forests (FHBF) algorithm, which incorporates a hybrid weight update approach to overcome ill-posed problems that arise from overfitting effects during the training across highly imbalanced datasets in the cloud. Eight case studies were conducted to stress the performance of FHBF against existing algorithms toward the development of robust AI models for lymphoma development across 18 European federated databases. Our results highlight the robustness of FHBF, yielding an average loss of 0.527 compared with FGBT (0.611) and FDART (0.584) with increased classification performance (0.938 sensitivity, 0.732 specificity).
Collapse
Affiliation(s)
- Vasileios C Pezoulas
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
| | - Fanis Kalatzis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
| | - Themis P Exarchos
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
- Department of Informatics, Ionian University, 49100 Corfu, Greece
| | - Andreas Goules
- Department of Pathophysiology, Faculty of Medicine, National and Kapodistrian University of Athens (NKUA), 15772 Athens, Greece
| | - Athanasios G Tzioufas
- Department of Pathophysiology, Faculty of Medicine, National and Kapodistrian University of Athens (NKUA), 15772 Athens, Greece
| | - Dimitrios I Fotiadis
- Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, 45110 Ioannina, Greece
- Biomedical Research Institute, FORTH, 45110 Ioannina, Greece
| |
Collapse
|
10
|
Pfeifer B, Chereda H, Martin R, Saranti A, Clemens S, Hauschild AC, Beißbarth T, Holzinger A, Heider D. Ensemble-GNN: federated ensemble learning with graph neural networks for disease module discovery and classification. Bioinformatics 2023; 39:btad703. [PMID: 37988152 PMCID: PMC10684359 DOI: 10.1093/bioinformatics/btad703] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 09/06/2023] [Accepted: 11/20/2023] [Indexed: 11/22/2023] Open
Abstract
SUMMARY Federated learning enables collaboration in medicine, where data is scattered across multiple centers without the need to aggregate the data in a central cloud. While, in general, machine learning models can be applied to a wide range of data types, graph neural networks (GNNs) are particularly developed for graphs, which are very common in the biomedical domain. For instance, a patient can be represented by a protein-protein interaction (PPI) network where the nodes contain the patient-specific omics features. Here, we present our Ensemble-GNN software package, which can be used to deploy federated, ensemble-based GNNs in Python. Ensemble-GNN allows to quickly build predictive models utilizing PPI networks consisting of various node features such as gene expression and/or DNA methylation. We exemplary show the results from a public dataset of 981 patients and 8469 genes from the Cancer Genome Atlas (TCGA). AVAILABILITY AND IMPLEMENTATION The source code is available at https://github.com/pievos101/Ensemble-GNN, and the data at Zenodo (DOI: 10.5281/zenodo.8305122).
Collapse
Affiliation(s)
- Bastian Pfeifer
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz 8036, Austria
| | - Hryhorii Chereda
- Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37077, Germany
| | - Roman Martin
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg 35043, Germany
| | - Anna Saranti
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz 8036, Austria
- Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna 1190, Austria
| | - Sandra Clemens
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg 35043, Germany
| | - Anne-Christin Hauschild
- Institute for Medical Informatics, University Medical Center Göttingen, Göttingen 37075, Germany
| | - Tim Beißbarth
- Medical Bioinformatics, University Medical Center Göttingen, Göttingen 37077, Germany
| | - Andreas Holzinger
- Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Graz 8036, Austria
- Human-Centered AI Lab, University of Natural Resources and Life Sciences, Vienna 1190, Austria
| | - Dominik Heider
- Data Science in Biomedicine, Department of Mathematics and Computer Science, University of Marburg, Marburg 35043, Germany
| |
Collapse
|
11
|
Tajabadi M, Grabenhenrich L, Ribeiro A, Leyer M, Heider D. Sharing Data With Shared Benefits: Artificial Intelligence Perspective. J Med Internet Res 2023; 25:e47540. [PMID: 37642995 PMCID: PMC10498316 DOI: 10.2196/47540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/09/2023] [Accepted: 06/27/2023] [Indexed: 08/31/2023] Open
Abstract
Artificial intelligence (AI) and data sharing go hand in hand. In order to develop powerful AI models for medical and health applications, data need to be collected and brought together over multiple centers. However, due to various reasons, including data privacy, not all data can be made publicly available or shared with other parties. Federated and swarm learning can help in these scenarios. However, in the private sector, such as between companies, the incentive is limited, as the resulting AI models would be available for all partners irrespective of their individual contribution, including the amount of data provided by each party. Here, we explore a potential solution to this challenge as a viewpoint, aiming to establish a fairer approach that encourages companies to engage in collaborative data analysis and AI modeling. Within the proposed approach, each individual participant could gain a model commensurate with their respective data contribution, ultimately leading to better diagnostic tools for all participants in a fair manner.
Collapse
Affiliation(s)
- Mohammad Tajabadi
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Linus Grabenhenrich
- Department for Methods Development, Research Infrastructure and Information Technology, Robert Koch Institute, Berlin, Germany
| | - Adèle Ribeiro
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| | - Michael Leyer
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
- School of Management, Faculty of Business & Law, Queensland University of Technology, Brisbane, Australia
| | - Dominik Heider
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Marburg, Germany
| |
Collapse
|
12
|
Oh W, Nadkarni GN. Federated Learning in Health care Using Structured Medical Data. ADVANCES IN KIDNEY DISEASE AND HEALTH 2023; 30:4-16. [PMID: 36723280 PMCID: PMC10208416 DOI: 10.1053/j.akdh.2022.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
The success of machine learning-based studies is largely subjected to accessing a large amount of data. However, accessing such data is typically not feasible within a single health system/hospital. Although multicenter studies are the most effective way to access a vast amount of data, sharing data outside the institutes involves legal, business, and technical challenges. Federated learning (FL) is a newly proposed machine learning framework for multicenter studies, tackling data-sharing issues across participant institutes. The promise of FL is simple. FL facilitates multicenter studies without losing data access control and allows the construction of a global model by aggregating local models trained from participant institutes. This article reviewed recently published studies that utilized FL in clinical studies with structured medical data. In addition, challenges and open questions in FL in clinical studies with structured medical data were discussed.
Collapse
Affiliation(s)
- Wonsuk Oh
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Girish N Nadkarni
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY; Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY.
| |
Collapse
|