1
|
Brauneck A, Schmalhorst L, Weiss S, Baumbach L, Völker U, Ellinghaus D, Baumbach J, Buchholtz G. Legal aspects of privacy-enhancing technologies in genome-wide association studies and their impact on performance and feasibility. Genome Biol 2024; 25:154. [PMID: 38872191 PMCID: PMC11170858 DOI: 10.1186/s13059-024-03296-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 06/03/2024] [Indexed: 06/15/2024] Open
Abstract
Genomic data holds huge potential for medical progress but requires strict safety measures due to its sensitive nature to comply with data protection laws. This conflict is especially pronounced in genome-wide association studies (GWAS) which rely on vast amounts of genomic data to improve medical diagnoses. To ensure both their benefits and sufficient data security, we propose a federated approach in combination with privacy-enhancing technologies utilising the findings from a systematic review on federated learning and legal regulations in general and applying these to GWAS.
Collapse
Affiliation(s)
- Alissa Brauneck
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany.
| | - Louisa Schmalhorst
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany
| | - Stefan Weiss
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - Linda Baumbach
- Department of Health Economics and Health Services Research, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - Uwe Völker
- Interfaculty Institute of Genetics and Functional Genomics, Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology (IKMB), Kiel University and University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Gabriele Buchholtz
- Hamburg University Faculty of Law, University of Hamburg, Hamburg, Germany
| |
Collapse
|
2
|
Zhou J, Huang C, Gao X. Patient privacy in AI-driven omics methods. Trends Genet 2024; 40:383-386. [PMID: 38637270 DOI: 10.1016/j.tig.2024.03.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 03/18/2024] [Accepted: 03/19/2024] [Indexed: 04/20/2024]
Abstract
Artificial intelligence (AI) in omics analysis raises privacy threats to patients. Here, we briefly discuss risk factors to patient privacy in data sharing, model training, and release, as well as methods to safeguard and evaluate patient privacy in AI-driven omics methods.
Collapse
Affiliation(s)
- Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia; Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia
| | - Chao Huang
- Ningbo Institute of Information Technology Application, Chinese Academy of Sciences (CAS), Ningbo, China
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia; Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955-6900, Kingdom of Saudi Arabia.
| |
Collapse
|
3
|
Späth J, Sewald Z, Probul N, Berland M, Almeida M, Pons N, Le Chatelier E, Ginès P, Solé C, Juanola A, Pauling J, Baumbach J. Privacy-Preserving Federated Survival Support Vector Machines for Cross-Institutional Time-To-Event Analysis: Algorithm Development and Validation. JMIR AI 2024; 3:e47652. [PMID: 38875678 PMCID: PMC11041494 DOI: 10.2196/47652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 08/06/2023] [Accepted: 02/10/2024] [Indexed: 06/16/2024]
Abstract
BACKGROUND Central collection of distributed medical patient data is problematic due to strict privacy regulations. Especially in clinical environments, such as clinical time-to-event studies, large sample sizes are critical but usually not available at a single institution. It has been shown recently that federated learning, combined with privacy-enhancing technologies, is an excellent and privacy-preserving alternative to data sharing. OBJECTIVE This study aims to develop and validate a privacy-preserving, federated survival support vector machine (SVM) and make it accessible for researchers to perform cross-institutional time-to-event analyses. METHODS We extended the survival SVM algorithm to be applicable in federated environments. We further implemented it as a FeatureCloud app, enabling it to run in the federated infrastructure provided by the FeatureCloud platform. Finally, we evaluated our algorithm on 3 benchmark data sets, a large sample size synthetic data set, and a real-world microbiome data set and compared the results to the corresponding central method. RESULTS Our federated survival SVM produces highly similar results to the centralized model on all data sets. The maximal difference between the model weights of the central model and the federated model was only 0.001, and the mean difference over all data sets was 0.0002. We further show that by including more data in the analysis through federated learning, predictions are more accurate even in the presence of site-dependent batch effects. CONCLUSIONS The federated survival SVM extends the palette of federated time-to-event analysis methods by a robust machine learning approach. To our knowledge, the implemented FeatureCloud app is the first publicly available implementation of a federated survival SVM, is freely accessible for all kinds of researchers, and can be directly used within the FeatureCloud platform.
Collapse
Affiliation(s)
- Julian Späth
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Zeno Sewald
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Niklas Probul
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Magali Berland
- MetaGenoPolis, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Mathieu Almeida
- MetaGenoPolis, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | - Nicolas Pons
- MetaGenoPolis, INRAE, Université Paris-Saclay, Jouy-en-Josas, France
| | | | - Pere Ginès
- Liver Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centro de Investigacion en Red de Enfermedades hepaticas y Digestivas (CIBEReHD), Madrid, Spain
- Faculty of Medicine and Health Sciences, University of Barcelona, Barcelona, Spain
| | - Cristina Solé
- Liver Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centro de Investigacion en Red de Enfermedades hepaticas y Digestivas (CIBEReHD), Madrid, Spain
| | - Adrià Juanola
- Liver Unit, Hospital Clínic de Barcelona, Barcelona, Spain
- Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
- Centro de Investigacion en Red de Enfermedades hepaticas y Digestivas (CIBEReHD), Madrid, Spain
| | - Josch Pauling
- LipiTUM, Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| |
Collapse
|
4
|
Zhou J, Chen S, Wu Y, Li H, Zhang B, Zhou L, Hu Y, Xiang Z, Li Z, Chen N, Han W, Xu C, Wang D, Gao X. PPML-Omics: A privacy-preserving federated machine learning method protects patients' privacy in omic data. SCIENCE ADVANCES 2024; 10:eadh8601. [PMID: 38295178 PMCID: PMC10830108 DOI: 10.1126/sciadv.adh8601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/18/2023] [Accepted: 12/29/2023] [Indexed: 02/02/2024]
Abstract
Modern machine learning models toward various tasks with omic data analysis give rise to threats of privacy leakage of patients involved in those datasets. Here, we proposed a secure and privacy-preserving machine learning method (PPML-Omics) by designing a decentralized differential private federated learning algorithm. We applied PPML-Omics to analyze data from three sequencing technologies and addressed the privacy concern in three major tasks of omic data under three representative deep learning models. We examined privacy breaches in depth through privacy attack experiments and demonstrated that PPML-Omics could protect patients' privacy. In each of these applications, PPML-Omics was able to outperform methods of comparison under the same level of privacy guarantee, demonstrating the versatility of the method in simultaneously balancing the privacy-preserving capability and utility in omic data analysis. Furthermore, we gave the theoretical proof of the privacy-preserving capability of PPML-Omics, suggesting the first mathematically guaranteed method with robust and generalizable empirical performance in protecting patients' privacy in omic data.
Collapse
Affiliation(s)
- Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Siyuan Chen
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Yulian Wu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Haoyang Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Bin Zhang
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Longxi Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Yan Hu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Zihang Xiang
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Ningning Chen
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Chencheng Xu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Di Wang
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- Computational Bioscience Research Center, Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
5
|
Raimondi D, Chizari H, Verplaetse N, Löscher BS, Franke A, Moreau Y. Genome interpretation in a federated learning context allows the multi-center exome-based risk prediction of Crohn's disease patients. Sci Rep 2023; 13:19449. [PMID: 37945674 PMCID: PMC10636050 DOI: 10.1038/s41598-023-46887-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 11/06/2023] [Indexed: 11/12/2023] Open
Abstract
High-throughput sequencing allowed the discovery of many disease variants, but nowadays it is becoming clear that the abundance of genomics data mostly just moved the bottleneck in Genetics and Precision Medicine from a data availability issue to a data interpretation issue. To solve this empasse it would be beneficial to apply the latest Deep Learning (DL) methods to the Genome Interpretation (GI) problem, similarly to what AlphaFold did for Structural Biology. Unfortunately DL requires large datasets to be viable, and aggregating genomics datasets poses several legal, ethical and infrastructural complications. Federated Learning (FL) is a Machine Learning (ML) paradigm designed to tackle these issues. It allows ML methods to be collaboratively trained and tested on collections of physically separate datasets, without requiring the actual centralization of sensitive data. FL could thus be key to enable DL applications to GI on sufficiently large genomics data. We propose FedCrohn, a FL GI Neural Network model for the exome-based Crohn's Disease risk prediction, providing a proof-of-concept that FL is a viable paradigm to build novel ML GI approaches. We benchmark it in several realistic scenarios, showing that FL can indeed provide performances similar to conventional ML on centralized data, and that collaborating in FL initiatives is likely beneficial for most of the medical centers participating in them.
Collapse
Affiliation(s)
| | | | | | - Britt-Sabina Löscher
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
- University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Christian-Albrechts-University of Kiel, Kiel, Germany
- University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Yves Moreau
- ESAT-STADIUS, KU Leuven, 3001, Leuven, Belgium
| |
Collapse
|
6
|
Xu X, Qi Z, Han X, Xu A, Geng Z, He X, Ren Y, Duo Z. Predicting anticancer drug sensitivity on distributed data sources using federated deep learning. Heliyon 2023; 9:e18615. [PMID: 37593639 PMCID: PMC10427996 DOI: 10.1016/j.heliyon.2023.e18615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/12/2023] [Accepted: 07/24/2023] [Indexed: 08/19/2023] Open
Abstract
Drug sensitivity prediction plays a crucial role in precision cancer therapy. Collaboration among medical institutions can lead to better performance in drug sensitivity prediction. However, patient privacy and data protection regulation remain a severe impediment to centralized prediction studies. For the first time, we proposed a federated drug sensitivity prediction model with high generalization, combining distributed data sources while protecting private data. Cell lines are first classified into three categories using the waterfall method. Focal loss for solving class imbalance is then embedded into the horizontal federated deep learning framework, i.e., HFDL-fl is presented. Applying HFDL-fl to homogeneous and heterogeneous data, we obtained HFDL-Cross and HFDL-Within. Our comprehensive experiments demonstrated that (i) collaboration by HFDL-fl outperforms private model on local data, (ii) focal loss function can effectively improve model performance to classify cell lines in sensitive and resistant categories, and (iii) HFDL-fl is not significantly affected by data heterogeneity. To summarize, HFDL-fl provides a valuable solution to break down the barriers between medical institutions for privacy-preserving drug sensitivity prediction and therefore facilitates the development of cancer precision medicine and other privacy-related biomedical research.
Collapse
Affiliation(s)
- Xiaolu Xu
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Zitong Qi
- Department of Statistics, University of Washington, Seattle, WA 98195, USA
| | - Xiumei Han
- College of Artificial Intelligence, Dalian Maritime University, Dalian 116026, China
| | - Aiguo Xu
- Department of Oncology, The Second People's Hospital of Lianyungang, Lianyungang 222023, China
| | - Zhaohong Geng
- Department of Cardiology, Second Affiliated Hospital of Dalian Medical University, Dalian 116023, China
| | - Xinyu He
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Yonggong Ren
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| | - Zhaojun Duo
- School of Computer and Artificial Intelligence, Liaoning Normal University, Dalian 116029, China
| |
Collapse
|
7
|
Federated machine learning in data-protection-compliant research. NAT MACH INTELL 2023. [DOI: 10.1038/s42256-022-00601-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
8
|
Cao H, Zhang Y, Baumbach J, Burton PR, Dwyer D, Koutsouleris N, Matschinske J, Marcon Y, Rajan S, Rieg T, Ryser-Welch P, Späth J, Herrmann C, Schwarz E. dsMTL - a computational framework for privacy-preserving, distributed multi-task machine learning. Bioinformatics 2022; 38:4919-4926. [PMID: 36073911 DOI: 10.1093/bioinformatics/btac616] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 09/06/2022] [Accepted: 09/07/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION In multi-cohort machine learning studies, it is critical to differentiate between effects that are reproducible across cohorts and those that are cohort-specific. Multi-task learning (MTL) is a machine learning approach that facilitates this differentiation through the simultaneous learning of prediction tasks across cohorts. Since multi-cohort data can often not be combined into a single storage solution, there would be the substantial utility of an MTL application for geographically distributed data sources. RESULTS Here, we describe the development of "dsMTL", a computational framework for privacy-preserving, distributed multi-task machine learning that includes three supervised and one unsupervised algorithms. First, we derive the theoretical properties of these methods and the relevant machine learning workflows to ensure the validity of the software implementation. Second, we implement dsMTL as a library for the R programming language, building on the DataSHIELD platform that supports the federated analysis of sensitive individual-level data. Third, we demonstrate the applicability of dsMTL for comorbidity modeling in distributed data. We show that comorbidity modeling using dsMTL outperformed conventional, federated machine learning, as well as the aggregation of multiple models built on the distributed datasets individually. The application of dsMTL was computationally efficient and highly scalable when applied to moderate-size (n < 500), real expression data given the actual network latency. AVAILABILITY dsMTL is freely available at https://github.com/transbioZI/dsMTLBase (server-side package) and https://github.com/transbioZI/dsMTLClient (client-side package). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Han Cao
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Youcheng Zhang
- Health Data Science Unit, Medical Faculty Heidelberg & BioQuant, Heidelberg, 69120, Germany
| | - Jan Baumbach
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Paul R Burton
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Dominic Dwyer
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany
| | - Nikolaos Koutsouleris
- Department of Psychiatry and Psychotherapy, Section for Neurodiagnostic Applications, Ludwig-Maximilian University, Munich 80638, Germany
| | - Julian Matschinske
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Sivanesan Rajan
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Thilo Rieg
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| | - Patricia Ryser-Welch
- Population Health Sciences Institute, Newcastle University, Newcastle upon Tyne, United Kingdom
| | - Julian Späth
- Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | | | - Carl Herrmann
- Health Data Science Unit, Medical Faculty Heidelberg & BioQuant, Heidelberg, 69120, Germany
| | - Emanuel Schwarz
- Department of Psychiatry and Psychotherapy, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
| |
Collapse
|
9
|
Späth J, Matschinske J, Kamanu FK, Murphy SA, Zolotareva O, Bakhtiari M, Antman EM, Loscalzo J, Brauneck A, Schmalhorst L, Buchholtz G, Baumbach J. Privacy-aware multi-institutional time-to-event studies. PLOS DIGITAL HEALTH 2022; 1:e0000101. [PMID: 36812603 PMCID: PMC9931301 DOI: 10.1371/journal.pdig.0000101] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 08/06/2022] [Indexed: 11/19/2022]
Abstract
Clinical time-to-event studies are dependent on large sample sizes, often not available at a single institution. However, this is countered by the fact that, particularly in the medical field, individual institutions are often legally unable to share their data, as medical data is subject to strong privacy protection due to its particular sensitivity. But the collection, and especially aggregation into centralized datasets, is also fraught with substantial legal risks and often outright unlawful. Existing solutions using federated learning have already demonstrated considerable potential as an alternative for central data collection. Unfortunately, current approaches are incomplete or not easily applicable in clinical studies owing to the complexity of federated infrastructures. This work presents privacy-aware and federated implementations of the most used time-to-event algorithms (survival curve, cumulative hazard rate, log-rank test, and Cox proportional hazards model) in clinical trials, based on a hybrid approach of federated learning, additive secret sharing, and differential privacy. On several benchmark datasets, we show that all algorithms produce highly similar, or in some cases, even identical results compared to traditional centralized time-to-event algorithms. Furthermore, we were able to reproduce the results of a previous clinical time-to-event study in various federated scenarios. All algorithms are accessible through the intuitive web-app Partea (https://partea.zbh.uni-hamburg.de), offering a graphical user interface for clinicians and non-computational researchers without programming knowledge. Partea removes the high infrastructural hurdles derived from existing federated learning approaches and removes the complexity of execution. Therefore, it is an easy-to-use alternative to central data collection, reducing bureaucratic efforts but also the legal risks associated with the processing of personal data to a minimum.
Collapse
Affiliation(s)
- Julian Späth
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- * E-mail:
| | - Julian Matschinske
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Frederick K. Kamanu
- TIMI Study Group, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Sabina A. Murphy
- TIMI Study Group, Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Olga Zolotareva
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Chair of Proteomics and Bioanalytics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
| | - Mohammad Bakhtiari
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Elliott M. Antman
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alissa Brauneck
- Faculty of Legal Sciences, University of Hamburg, Hamburg, Germany
| | | | | | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| |
Collapse
|