Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Shi H, Jiang C, Dai W, Jiang X, Tang Y, Ohno-Machado L, Wang S. Secure Multi-pArty Computation Grid LOgistic REgression (SMAC-GLORE). BMC Med Inform Decis Mak 2016;16 Suppl 3:89. [PMID: 27454168 PMCID: PMC4959358 DOI: 10.1186/s12911-016-0316-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

For:	Shi H, Jiang C, Dai W, Jiang X, Tang Y, Ohno-Machado L, Wang S. Secure Multi-pArty Computation Grid LOgistic REgression (SMAC-GLORE). BMC Med Inform Decis Mak 2016;16 Suppl 3:89. [PMID: 27454168 PMCID: PMC4959358 DOI: 10.1186/s12911-016-0316-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Aherrahrou N, Tairi H, Aherrahrou Z. Genomic privacy preservation in genome-wide association studies: taxonomy, limitations, challenges, and vision. Brief Bioinform 2024;25:bbae356. [PMID: 39073827 DOI: 10.1093/bib/bbae356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Revised: 06/19/2024] [Accepted: 07/12/2024] [Indexed: 07/30/2024] Open

Dong X, Lu Y, Guo L, Li C, Ni Q, Wu B, Wang H, Yang L, Wu S, Sun Q, Zheng H, Zhou W, Wang S. PICOTEES: a privacy-preserving online service of phenotype exploration for genetic-diagnostic variants from Chinese children cohorts. J Genet Genomics 2024;51:243-251. [PMID: 37714454 DOI: 10.1016/j.jgg.2023.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 08/31/2023] [Accepted: 09/03/2023] [Indexed: 09/17/2023]

Affiliation(s)

Xinran Dong Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
Yulan Lu Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
Lanting Guo Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China
Chuan Li Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China
Qi Ni Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
Bingbing Wu Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
Huijun Wang Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
Lin Yang Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Key Laboratory of Birth Defects, Children's Hospital of Fudan University, Shanghai 201102, China
Songyang Wu The Third Research Institute of the Ministry of Public Security, Shanghai 200031, China
Qi Sun Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China
Hao Zheng Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China
Wenhao Zhou Center for Molecular Medicine, Children's Hospital of Fudan University, Shanghai 201102, China; Xiamen Campus of Children's Hospital of Fudan University, Xiamen, Fujian 361006, China.
Shuang Wang Department of Bioinformatics, Hangzhou Nuowei Information Technology Co., Ltd, Hangzhou, Zhejiang 310000, China; Institutes for Systems Genetics, West China Hospital, Chengdu, Sichuan 610041, China; Shanghai Putuo People's Hospital, Tongji University, Shanghai 200060, China.

Collapse

Wang X, Dervishi L, Li W, Ayday E, Jiang X, Vaidya J. Privacy-preserving federated genome-wide association studies via dynamic sampling. Bioinformatics 2023;39:btad639. [PMID: 37856329 PMCID: PMC10612407 DOI: 10.1093/bioinformatics/btad639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 09/15/2023] [Accepted: 10/18/2023] [Indexed: 10/21/2023] Open

Bon JJ, Bretherton A, Buchhorn K, Cramb S, Drovandi C, Hassan C, Jenner AL, Mayfield HJ, McGree JM, Mengersen K, Price A, Salomone R, Santos-Fernandez E, Vercelloni J, Wang X. Being Bayesian in the 2020s: opportunities and challenges in the practice of modern applied Bayesian statistics. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2023;381:20220156. [PMID: 36970822 PMCID: PMC10041356 DOI: 10.1098/rsta.2022.0156] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Accepted: 01/06/2023] [Indexed: 06/18/2023]

Affiliation(s)

Joshua J. Bon Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Adam Bretherton Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Katie Buchhorn Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Susanna Cramb Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Public Health and Social Work, Queensland University of Technology, Brisbane, Queensland, Australia
Christopher Drovandi Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Conor Hassan Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Adrianne L. Jenner Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Helen J. Mayfield Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Public Health, The University of Queensland, Saint Lucia, Queensland, Australia
James M. McGree Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Kerrie Mengersen Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Aiden Price Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Robert Salomone Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Computer Science, Queensland University of Technology, Brisbane, Queensland, Australia
Edgar Santos-Fernandez Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Julie Vercelloni Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia
Xiaoyu Wang Centre for Data Science, Queensland University of Technology, Brisbane, Queensland, Australia School of Mathematical Sciences, Queensland University of Technology, Brisbane, Queensland, Australia

Collapse

Wirth FN, Kussel T, Müller A, Hamacher K, Prasser F. EasySMPC: a simple but powerful no-code tool for practical secure multiparty computation. BMC Bioinformatics 2022;23:531. [PMID: 36494612 PMCID: PMC9733077 DOI: 10.1186/s12859-022-05044-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/08/2022] [Indexed: 12/13/2022] Open

Abstract

BACKGROUND

Modern biomedical research is data-driven and relies heavily on the re-use and sharing of data. Biomedical data, however, is subject to strict data protection requirements. Due to the complexity of the data required and the scale of data use, obtaining informed consent is often infeasible. Other methods, such as anonymization or federation, in turn have their own limitations. Secure multi-party computation (SMPC) is a cryptographic technology for distributed calculations, which brings formally provable security and privacy guarantees and can be used to implement a wide-range of analytical approaches. As a relatively new technology, SMPC is still rarely used in real-world biomedical data sharing activities due to several barriers, including its technical complexity and lack of usability.

RESULTS

To overcome these barriers, we have developed the tool EasySMPC, which is implemented in Java as a cross-platform, stand-alone desktop application provided as open-source software. The tool makes use of the SMPC method Arithmetic Secret Sharing, which allows to securely sum up pre-defined sets of variables among different parties in two rounds of communication (input sharing and output reconstruction) and integrates this method into a graphical user interface. No additional software services need to be set up or configured, as EasySMPC uses the most widespread digital communication channel available: e-mails. No cryptographic keys need to be exchanged between the parties and e-mails are exchanged automatically by the software. To demonstrate the practicability of our solution, we evaluated its performance in a wide range of data sharing scenarios. The results of our evaluation show that our approach is scalable (summing up 10,000 variables between 20 parties takes less than 300 s) and that the number of participants is the essential factor.

CONCLUSIONS

We have developed an easy-to-use "no-code solution" for performing secure joint calculations on biomedical data using SMPC protocols, which is suitable for use by scientists without IT expertise and which has no special infrastructure requirements. We believe that innovative approaches to data sharing with SMPC are needed to foster the translation of complex protocols into practice.

Collapse

Torkzadehmahani R, Nasirigerdeh R, Blumenthal DB, Kacprowski T, List M, Matschinske J, Spaeth J, Wenke NK, Baumbach J. Privacy-Preserving Artificial Intelligence Techniques in Biomedicine. Methods Inf Med 2022;61:e12-e27. [PMID: 35062032 PMCID: PMC9246509 DOI: 10.1055/s-0041-1740630] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 09/18/2021] [Indexed: 12/15/2022]

Ghavamipour AR, Turkmen F, Jiang X. Privacy-preserving logistic regression with secret sharing. BMC Med Inform Decis Mak 2022;22:89. [PMID: 35366870 PMCID: PMC8977014 DOI: 10.1186/s12911-022-01811-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 02/22/2022] [Indexed: 11/10/2022] Open

Kamphorst B, Rooijakkers T, Veugen T, Cellamare M, Knoors D. Accurate training of the Cox proportional hazards model on vertically-partitioned data while preserving privacy. BMC Med Inform Decis Mak 2022;22:49. [PMID: 35209883 PMCID: PMC8867891 DOI: 10.1186/s12911-022-01771-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 01/20/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained.

METHODS

We use a Newton-Raphson solver to securely train the CPH model with MPC, jointly with all data holders, without revealing any sensitive data. In order to securely compute the log-partial likelihood in each iteration, we run into several technical challenges to preserve the efficiency and security of our solution. To tackle these technical challenges, we generalize a cryptographic protocol for securely computing the inverse of the Hessian matrix and develop a new method for securely computing exponentiations. A theoretical complexity estimate is given to get insight into the computational and communication effort that is needed.

RESULTS

Our secure solution is implemented in a setting with three different machines, each presenting a different data holder, which can communicate through the internet. The MPyC platform is used for implementing this privacy-preserving solution to obtain the CPH model. We test the accuracy and computation time of our methods on three standard benchmark survival datasets. We identify future work to make our solution more efficient.

CONCLUSIONS

Our secure solution is comparable with the standard, non-secure solver in terms of accuracy and convergence speed. The computation time is considerably larger, although the theoretical complexity is still cubic in the number of covariates and quadratic in the number of subjects. We conclude that this is a promising way of performing parametric survival analysis on vertically-distributed medical data, while realising high level of security and privacy.

Collapse

Nasirigerdeh R, Torkzadehmahani R, Matschinske J, Frisch T, List M, Späth J, Weiss S, Völker U, Pitkänen E, Heider D, Wenke NK, Kaissis G, Rueckert D, Kacprowski T, Baumbach J. sPLINK: a hybrid federated tool as a robust alternative to meta-analysis in genome-wide association studies. Genome Biol 2022;23:32. [PMID: 35073941 PMCID: PMC8785575 DOI: 10.1186/s13059-021-02562-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Accepted: 12/02/2021] [Indexed: 11/10/2022] Open

Affiliation(s)

Reza Nasirigerdeh AI in Medicine and Healthcare, Technical University of Munich, Munich, Germany. Klinikum rechts der Isar, Munich, Germany.
Reihaneh Torkzadehmahani AI in Medicine and Healthcare, Technical University of Munich, Munich, Germany
Julian Matschinske Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
Tobias Frisch Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
Markus List Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
Julian Späth Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
Stefan Weiss Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
Uwe Völker Department of Functional Genomics, University Medicine Greifswald, Greifswald, Germany
Esa Pitkänen Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland Applied Tumor Genomics Research Program, Research Programs Unit, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Dominik Heider Department of Mathematics and Computer Science, University of Marburg, Marburg, Germany
Nina Kerstin Wenke Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
Georgios Kaissis AI in Medicine and Healthcare, Technical University of Munich, Munich, Germany Klinikum rechts der Isar, Munich, Germany Biomedical Image Analysis Group, Imperial College London, London, UK OpenMined, Oxford, UK
Daniel Rueckert AI in Medicine and Healthcare, Technical University of Munich, Munich, Germany Klinikum rechts der Isar, Munich, Germany Biomedical Image Analysis Group, Imperial College London, London, UK
Tim Kacprowski Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Hannover Medical School, Brunswick, Germany Braunschweig Integrated Centre of Systems Biology (BRICS), Brunswick, Germany
Jan Baumbach Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark

Collapse

Spini G, Mancini E, Attema T, Abspoel M, de Gier J, Fehr S, Veugen T, van Heesch M, Worm D, De Luca A, Cramer R, Sloot PM. New Approach to Privacy-Preserving Clinical Decision Support Systems for HIV Treatment. J Med Syst 2022;46:84. [PMID: 36261621 PMCID: PMC9581834 DOI: 10.1007/s10916-022-01851-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 08/09/2022] [Accepted: 08/16/2022] [Indexed: 01/04/2023]

Affiliation(s)

Gabriele Spini Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
Emiliano Mancini Institute for Advanced Study, University of Amsterdam, Oude Turfmarkt 147, 1012 GC Amsterdam, The Netherlands ,Department of Global Health, Amsterdam UMC, Location AMC, 1105 AZ Amsterdam, The Netherlands ,Data Science Institute, Hasselt University, Diepenbeek, Belgium
Thomas Attema Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands ,Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Mathematical Institute, Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
Mark Abspoel Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Philips Research, High Tech Campus 34, 5656 AE Eindhoven, The Netherlands
Jan de Gier Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
Serge Fehr Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Mathematical Institute, Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
Thijs Veugen Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands ,Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands
Maran van Heesch Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
Daniël Worm Applied Cryptography and Quantum Algorithms, TNO, 96800, 2509 JE Postbus, The Hague, The Netherlands
Andrea De Luca Department of Medical Biotechnologies, University of Siena and Siena University Hospital, Viale Mario Bracci 16, 53100 Siena, Italy
Ronald Cramer Cryptology Group, CWI, P.O. Box 94079, 1090 GB Amsterdam, The Netherlands ,Mathematical Institute, Leiden University, P.O. Box 9512, 2300 RA Leiden, The Netherlands
Peter M.A. Sloot Institute for Advanced Study, University of Amsterdam, Oude Turfmarkt 147, 1012 GC Amsterdam, The Netherlands ,Complexity Institute, Nanyang Technological University, Academic Building North, Level 1 Section B Unit No. 7 (ABN-01B-07), 61 Nanyang Drive, 637335 Singapore, Singapore ,Advanced Computing, ITMO University, Lomonosova street 9, 191002 Saint Petersburg, Russia

Collapse

van Egmond MB, Spini G, van der Galien O, IJpma A, Veugen T, Kraaij W, Sangers A, Rooijakkers T, Langenkamp P, Kamphorst B, van de L'Isle N, Kooij-Janic M. Privacy-preserving dataset combination and Lasso regression for healthcare predictions. BMC Med Inform Decis Mak 2021;21:266. [PMID: 34530824 PMCID: PMC8445286 DOI: 10.1186/s12911-021-01582-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 06/29/2021] [Indexed: 11/12/2022] Open

Multi-Party Privacy-Preserving Logistic Regression with Poor Quality Data Filtering for IoT Contributors. ELECTRONICS 2021. [DOI: 10.3390/electronics10172049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Wirth FN, Meurers T, Johns M, Prasser F. Privacy-preserving data sharing infrastructures for medical research: systematization and comparison. BMC Med Inform Decis Mak 2021;21:242. [PMID: 34384406 PMCID: PMC8359765 DOI: 10.1186/s12911-021-01602-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 07/31/2021] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and research participants remain anonymous when data is shared. However, privacy protection typically comes at a cost, e.g. restrictions regarding the types of analyses that can be performed on shared data. What is lacking is a systematization making the trade-offs taken by different approaches transparent. The aim of the work described in this paper was to develop a systematization for the degree of privacy protection provided and the trade-offs taken by different data sharing methods. Based on this contribution, we categorized popular data sharing approaches and identified research gaps by analyzing combinations of promising properties and features that are not yet supported by existing approaches.

METHODS

The systematization consists of different axes. Three axes relate to privacy protection aspects and were adopted from the popular Five Safes Framework: (1) safe data, addressing privacy at the input level, (2) safe settings, addressing privacy during shared processing, and (3) safe outputs, addressing privacy protection of analysis results. Three additional axes address the usefulness of approaches: (4) support for de-duplication, to enable the reconciliation of data belonging to the same individuals, (5) flexibility, to be able to adapt to different data analysis requirements, and (6) scalability, to maintain performance with increasing complexity of shared data or common analysis processes.

RESULTS

Using the systematization, we identified three different categories of approaches: distributed data analyses, which exchange anonymous aggregated data, secure multi-party computation protocols, which exchange encrypted data, and data enclaves, which store pooled individual-level data in secure environments for access for analysis purposes. We identified important research gaps, including a lack of approaches enabling the de-duplication of horizontally distributed data or providing a high degree of flexibility.

CONCLUSIONS

There are fundamental differences between different data sharing approaches and several gaps in their functionality that may be interesting to investigate in future work. Our systematization can make the properties of privacy-preserving data sharing infrastructures more transparent and support decision makers and regulatory authorities with a better understanding of the trade-offs taken.

Collapse

Dong X, Randolph DA, Weng C, Kho AN, Rogers JM, Wang X. Developing High Performance Secure Multi-Party Computation Protocols in Healthcare: A Case Study of Patient Risk Stratification. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021;2021:200-209. [PMID: 34457134 PMCID: PMC8378657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Park JA, Sung MD, Kim HH, Park YR. Weight-Based Framework for Predictive Modeling of Multiple Databases With Noniterative Communication Without Data Sharing: Privacy-Protecting Analytic Method for Multi-Institutional Studies. JMIR Med Inform 2021;9:e21043. [PMID: 33818396 PMCID: PMC8056295 DOI: 10.2196/21043] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2020] [Revised: 11/16/2020] [Accepted: 03/03/2021] [Indexed: 01/22/2023] Open

Abstract

Background

Securing the representativeness of study populations is crucial in biomedical research to ensure high generalizability. In this regard, using multi-institutional data have advantages in medicine. However, combining data physically is difficult as the confidential nature of biomedical data causes privacy issues. Therefore, a methodological approach is necessary when using multi-institution medical data for research to develop a model without sharing data between institutions.

Objective

This study aims to develop a weight-based integrated predictive model of multi-institutional data, which does not require iterative communication between institutions, to improve average predictive performance by increasing the generalizability of the model under privacy-preserving conditions without sharing patient-level data.

Methods

The weight-based integrated model generates a weight for each institutional model and builds an integrated model for multi-institutional data based on these weights. We performed 3 simulations to show the weight characteristics and to determine the number of repetitions of the weight required to obtain stable values. We also conducted an experiment using real multi-institutional data to verify the developed weight-based integrated model. We selected 10 hospitals (2845 intensive care unit [ICU] stays in total) from the electronic intensive care unit Collaborative Research Database to predict ICU mortality with 11 features. To evaluate the validity of our model, compared with a centralized model, which was developed by combining all the data of 10 hospitals, we used proportional overlap (ie, 0.5 or less indicates a significant difference at a level of .05; and 2 indicates 2 CIs overlapping completely). Standard and firth logistic regression models were applied for the 2 simulations and the experiment.

Results

The results of these simulations indicate that the weight of each institution is determined by 2 factors (ie, the data size of each institution and how well each institutional model fits into the overall institutional data) and that repeatedly generating 200 weights is necessary per institution. In the experiment, the estimated area under the receiver operating characteristic curve (AUC) and 95% CIs were 81.36% (79.37%-83.36%) and 81.95% (80.03%-83.87%) in the centralized model and weight-based integrated model, respectively. The proportional overlap of the CIs for AUC in both the weight-based integrated model and the centralized model was approximately 1.70, and that of overlap of the 11 estimated odds ratios was over 1, except for 1 case.

Conclusions

In the experiment where real multi-institutional data were used, our model showed similar results to the centralized model without iterative communication between institutions. In addition, our weight-based integrated model provided a weighted average model by integrating 10 models overfitted or underfitted, compared with the centralized model. The proposed weight-based integrated model is expected to provide an efficient distributed research approach as it increases the generalizability of the model and does not require iterative communication.

Collapse

Kuo TT, Gabriel RA, Cidambi KR, Ohno-Machado L. EXpectation Propagation LOgistic REgRession on permissioned blockCHAIN (ExplorerChain): decentralized online healthcare/genomics predictive model learning. J Am Med Inform Assoc 2021;27:747-756. [PMID: 32364235 PMCID: PMC7309256 DOI: 10.1093/jamia/ocaa023] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 02/11/2020] [Accepted: 02/24/2020] [Indexed: 11/19/2022] Open

Abstract

Objective

Predicting patient outcomes using healthcare/genomics data is an increasingly popular/important area. However, some diseases are rare and require data from multiple institutions to construct generalizable models. To address institutional data protection policies, many distributed methods keep the data locally but rely on a central server for coordination, which introduces risks such as a single point of failure. We focus on providing an alternative based on a decentralized approach. We introduce the idea using blockchain technology for this purpose, with a brief description of its own potential advantages/disadvantages.

Materials and Methods

We explain how our proposed EXpectation Propagation LOgistic REgRession on Permissioned blockCHAIN (ExplorerChain) can achieve the same results when compared to a distributed model that uses a central server on 3 healthcare/genomic datasets, and what trade-offs need to be considered when using centralized/decentralized methods. We explain how the use of blockchain technology can help decrease some of the problems encountered in decentralized methods.

Results

We showed that the discrimination power of ExplorerChain can be statistically similar to its counterpart central server-based algorithm. While ExplorerChain inherited some benefits of blockchain, it had a small increased running time.

Discussion

ExplorerChain has the same prerequisites as a distributed model with a centralized server for coordination. In a manner similar to secure multi-party computation strategies, it assumes that participating institutions are honest, but “curious.”

Conclusion

When evaluated on relatively small datasets, results suggest that ExplorerChain, which combines artificial intelligence and blockchain technologies, performs as well as a central server-based method, and may avoid some risks at the cost of efficiency.

Collapse

Kuo TT. The anatomy of a distributed predictive modeling framework: online learning, blockchain network, and consensus algorithm. JAMIA Open 2020;3:201-208. [PMID: 32734160 PMCID: PMC7382618 DOI: 10.1093/jamiaopen/ooaa017] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 04/21/2020] [Accepted: 04/29/2020] [Indexed: 11/23/2022] Open

Abstract

Objective

Cross-institutional distributed healthcare/genomic predictive modeling is an emerging technology that fulfills both the need of building a more generalizable model and of protecting patient data by only exchanging the models but not the patient data. In this article, the implementation details are presented for one specific blockchain-based approach, ExplorerChain, from a software development perspective. The healthcare/genomic use cases of myocardial infarction, cancer biomarker, and length of hospitalization after surgery are also described.

Materials and Methods

ExplorerChain’s 3 main technical components, including online machine learning, metadata of transaction, and the Proof-of-Information-Timed (PoINT) algorithm, are introduced in this study. Specifically, the 3 algorithms (ie, core, new network, and new site/data) are described in detail.

Results

ExplorerChain was implemented and the design details of it were illustrated, especially the development configurations in a practical setting. Also, the system architecture and programming languages are introduced. The code was also released in an open source repository available at https://github.com/tsungtingkuo/explorerchain.

Discussion

The designing considerations of semi-trust assumption, data format normalization, and non-determinism was discussed. The limitations of the implementation include fixed-number participating sites, limited join-or-leave capability during initialization, advanced privacy technology yet to be included, and further investigation in ethical, legal, and social implications.

Conclusion

This study can serve as a reference for the researchers who would like to implement and even deploy blockchain technology. Furthermore, the off-the-shelf software can also serve as a cornerstone to accelerate the development and investigation of future healthcare/genomic blockchain studies.

Collapse

Wu X, Zheng H, Dou Z, Chen F, Deng J, Chen X, Xu S, Gao G, Li M, Wang Z, Xiao Y, Xie K, Wang S, Xu H. A novel privacy-preserving federated genome-wide association study framework and its application in identifying potential risk variants in ankylosing spondylitis. Brief Bioinform 2020;22:5860679. [PMID: 32591779 DOI: 10.1093/bib/bbaa090] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2020] [Revised: 04/05/2020] [Accepted: 04/24/2020] [Indexed: 11/13/2022] Open

Scott ER, Wallsten RL. A Look to the Future. Pharmacogenomics 2019. [DOI: 10.1016/b978-0-12-812626-4.00010-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open

Jiang Y, Hamer J, Wang C, Jiang X, Kim M, Song Y, Xia Y, Mohammed N, Sadat MN, Wang S. SecureLR: Secure Logistic Regression Model via a Hybrid Cryptographic Protocol. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:113-123. [PMID: 29994005 DOI: 10.1109/tcbb.2018.2833463] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Secure top most significant genome variants search: iDASH 2017 competition. BMC Med Genomics 2018;11:82. [PMID: 30309361 PMCID: PMC6180353 DOI: 10.1186/s12920-018-0399-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open

Gibson JE, Ander EL, Cave M, Bath-Hextall F, Musah A, Leonardi-Bee J. Linkage of national soil quality measurements to primary care medical records in England and Wales: a new resource for investigating environmental impacts on human health. Popul Health Metr 2018;16:12. [PMID: 30012161 PMCID: PMC6048879 DOI: 10.1186/s12963-018-0168-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 06/19/2018] [Indexed: 12/02/2022] Open

Abstract

Background

Long-term, low-level exposure to toxic elements in soil may be harmful to human health but large longitudinal cohort studies with sufficient follow-up time to study these effects are cost-prohibitive and impractical. Linkage of routinely collected medical outcome data to systematic surveys of soil quality may offer a viable alternative.

Methods

We used the Geochemical Baseline Survey of the Environment (G-BASE), a systematic X-ray fluorescence survey of soil inorganic chemistry throughout England and Wales to obtain estimates of the concentrations of 15 elements in the soil contained within each English and Welsh postcode area. We linked these data to the residential postcodes of individuals enrolled in The Health Improvement Network (THIN), a large database of UK primary care medical records, to provide estimates of exposure. Observed exposure levels among the THIN population were compared with expectations based on UK population estimates to assess representativeness.

Results

Three hundred seventy-seven of three hundred ninety-five English and Welsh THIN practices agreed to participate in the linkage, providing complete residential soil metal estimates for 6,243,363 individuals (92% of all current and former patients) with a mean period of prospective computerised medical data collection (follow-up) of 6.75 years. Overall agreement between the THIN population and expectations was excellent; however, the number of participating practices in the Yorkshire & Humber strategic health authority was low, leading to restricted ranges of measurements for some elements relative to the known variations in geochemical concentrations in this area.

Conclusions

The linked database provides unprecedented population size and statistical power to study the effects of elements in soil on human health. With appropriate adjustment, results should be generalizable to and representative of the wider English and Welsh population.

Electronic supplementary material

The online version of this article (10.1186/s12963-018-0168-2) contains supplementary material, which is available to authorized users.

Collapse

Chenghong W, Jiang Y, Mohammed N, Chen F, Jiang X, Al Aziz MM, Sadat MN, Wang S. SCOTCH: Secure Counting Of encrypTed genomiC data using a Hybrid approach. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2017:1744-1753. [PMID: 29854245 PMCID: PMC5977689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Lee J, Sun J, Wang F, Wang S, Jun CH, Jiang X. Privacy-Preserving Patient Similarity Learning in a Federated Environment: Development and Analysis. JMIR Med Inform 2018;6:e20. [PMID: 29653917 PMCID: PMC5924379 DOI: 10.2196/medinform.7744] [Citation(s) in RCA: 55] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Revised: 09/12/2017] [Accepted: 01/06/2018] [Indexed: 12/14/2022] Open

Abstract

Background

There is an urgent need for the development of global analytic frameworks that can perform analyses in a privacy-preserving federated environment across multiple institutions without privacy leakage. A few studies on the topic of federated medical analysis have been conducted recently with the focus on several algorithms. However, none of them have solved similar patient matching, which is useful for applications such as cohort construction for cross-institution observational studies, disease surveillance, and clinical trials recruitment.

Objective

The aim of this study was to present a privacy-preserving platform in a federated setting for patient similarity learning across institutions. Without sharing patient-level information, our model can find similar patients from one hospital to another.

Methods

We proposed a federated patient hashing framework and developed a novel algorithm to learn context-specific hash codes to represent patients across institutions. The similarities between patients can be efficiently computed using the resulting hash codes of corresponding patients. To avoid security attack from reverse engineering on the model, we applied homomorphic encryption to patient similarity search in a federated setting.

Results

We used sequential medical events extracted from the Multiparameter Intelligent Monitoring in Intensive Care-III database to evaluate the proposed algorithm in predicting the incidence of five diseases independently. Our algorithm achieved averaged area under the curves of 0.9154 and 0.8012 with balanced and imbalanced data, respectively, in κ-nearest neighbor with κ=3. We also confirmed privacy preservation in similarity search by using homomorphic encryption.

Conclusions

The proposed algorithm can help search similar patients across institutions effectively to support federated data analysis in a privacy-preserving manner.

Collapse

Sadat MN, Jiang X, Aziz MMA, Wang S, Mohammed N. Secure and Efficient Regression Analysis Using a Hybrid Cryptographic Framework: Development and Evaluation. JMIR Med Inform 2018;6:e14. [PMID: 29506966 PMCID: PMC5859787 DOI: 10.2196/medinform.8286] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 10/25/2017] [Accepted: 01/03/2018] [Indexed: 11/25/2022] Open

Abstract

Background

Machine learning is an effective data-driven tool that is being widely used to extract valuable patterns and insights from data. Specifically, predictive machine learning models are very important in health care for clinical data analysis. The machine learning algorithms that generate predictive models often require pooling data from different sources to discover statistical patterns or correlations among different attributes of the input data. The primary challenge is to fulfill one major objective: preserving the privacy of individuals while discovering knowledge from data.

Objective

Our objective was to develop a hybrid cryptographic framework for performing regression analysis over distributed data in a secure and efficient way.

Methods

Existing secure computation schemes are not suitable for processing the large-scale data that are used in cutting-edge machine learning applications. We designed, developed, and evaluated a hybrid cryptographic framework, which can securely perform regression analysis, a fundamental machine learning algorithm using somewhat homomorphic encryption and a newly introduced secure hardware component of Intel Software Guard Extensions (Intel SGX) to ensure both privacy and efficiency at the same time.

Results

Experimental results demonstrate that our proposed method provides a better trade-off in terms of security and efficiency than solely secure hardware-based methods. Besides, there is no approximation error. Computed model parameters are exactly similar to plaintext results.

Conclusions

To the best of our knowledge, this kind of secure computation model using a hybrid cryptographic framework, which leverages both somewhat homomorphic encryption and Intel SGX, is not proposed or evaluated to this date. Our proposed framework ensures data security and computational efficiency at the same time.

Collapse

Chen F, Wang C, Dai W, Jiang X, Mohammed N, Al Aziz MM, Sadat MN, Sahinalp C, Lauter K, Wang S. PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre Guard Extension. BMC Med Genomics 2017;10:48. [PMID: 28786365 PMCID: PMC5547453 DOI: 10.1186/s12920-017-0281-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Privacy Preserving Federated Big Data Analysis. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/978-3-319-53817-4_3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]

Wang S, Jiang X, Singh S, Marmor R, Bonomi L, Fox D, Dow M, Ohno-Machado L. Genome privacy: challenges, technical approaches to mitigate risk, and ethical considerations in the United States. Ann N Y Acad Sci 2017;1387:73-83. [PMID: 27681358 PMCID: PMC5266631 DOI: 10.1111/nyas.13259] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 08/18/2016] [Accepted: 08/22/2016] [Indexed: 12/28/2022]

Constable SD, Tang Y, Wang S, Jiang X, Chapin S. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Med Inform Decis Mak 2015;15 Suppl 5:S2. [PMID: 26733045 PMCID: PMC4699163 DOI: 10.1186/1472-6947-15-s5-s2] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open