1
|
Suh J, Lee G, Kim JW, Shin J, Kim YJ, Lee SW, Kim S. Privacy-Preserving Prediction of Postoperative Mortality in Multi-Institutional Data: Development and Usability Study. JMIR Med Inform 2024; 12:e56893. [PMID: 38968600 DOI: 10.2196/56893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 05/07/2024] [Accepted: 06/08/2024] [Indexed: 07/07/2024] Open
Abstract
BACKGROUND To circumvent regulatory barriers that limit medical data exchange due to personal information security concerns, we use homomorphic encryption (HE) technology, enabling computation on encrypted data and enhancing privacy. OBJECTIVE This study explores whether using HE to integrate encrypted multi-institutional data enhances predictive power in research, focusing on the integration feasibility across institutions and determining the optimal size of hospital data sets for improved prediction models. METHODS We used data from 341,007 individuals aged 18 years and older who underwent noncardiac surgeries across 3 medical institutions. The study focused on predicting in-hospital mortality within 30 days postoperatively, using secure logistic regression based on HE as the prediction model. We compared the predictive performance of this model using plaintext data from a single institution against a model using encrypted data from multiple institutions. RESULTS The predictive model using encrypted data from all 3 institutions exhibited the best performance based on area under the receiver operating characteristic curve (0.941); the model combining Asan Medical Center (AMC) and Seoul National University Hospital (SNUH) data exhibited the best predictive performance based on area under the precision-recall curve (0.132). Both Ewha Womans University Medical Center and SNUH demonstrated improvement in predictive power for their own institutions upon their respective data's addition to the AMC data. CONCLUSIONS Prediction models using multi-institutional data sets processed with HE outperformed those using single-institution data sets, especially when our model adaptation approach was applied, which was further validated on a smaller host hospital with a limited data set.
Collapse
Affiliation(s)
- Jungyo Suh
- Department of Urology, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Garam Lee
- CryptoLab Inc, Seoul, Republic of Korea
| | | | | | - Yi-Jun Kim
- Department of Environmental Medicine, Ewha Womans University College of Medicine, Seoul, Republic of Korea
| | - Sang-Wook Lee
- Department of Anesthesiology and Pain Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Sulgi Kim
- CryptoLab Inc, Seoul, Republic of Korea
| |
Collapse
|
2
|
Kim K, Yang H, Lee J, Lee WG. Metaverse Wearables for Immersive Digital Healthcare: A Review. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2303234. [PMID: 37740417 PMCID: PMC10625124 DOI: 10.1002/advs.202303234] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 07/15/2023] [Indexed: 09/24/2023]
Abstract
The recent exponential growth of metaverse technology has been instrumental in reshaping a myriad of sectors, not least digital healthcare. This comprehensive review critically examines the landscape and future applications of metaverse wearables toward immersive digital healthcare. The key technologies and advancements that have spearheaded the metamorphosis of metaverse wearables are categorized, encapsulating all-encompassed extended reality, such as virtual reality, augmented reality, mixed reality, and other haptic feedback systems. Moreover, the fundamentals of their deployment in assistive healthcare (especially for rehabilitation), medical and nursing education, and remote patient management and treatment are investigated. The potential benefits of integrating metaverse wearables into healthcare paradigms are multifold, encompassing improved patient prognosis, enhanced accessibility to high-quality care, and high standards of practitioner instruction. Nevertheless, these technologies are not without their inherent challenges and untapped opportunities, which span privacy protection, data safeguarding, and innovation in artificial intelligence. In summary, future research trajectories and potential advancements to circumvent these hurdles are also discussed, further augmenting the incorporation of metaverse wearables within healthcare infrastructures in the post-pandemic era.
Collapse
Affiliation(s)
- Kisoo Kim
- Intelligent Optical Module Research CenterKorea Photonics Technology Institute (KOPTI)Gwangju61007Republic of Korea
| | - Hyosill Yang
- Department of NursingCollege of Nursing ScienceKyung Hee UniversitySeoul02447Republic of Korea
| | - Jihun Lee
- Department of Mechanical EngineeringCollege of EngineeringKyung Hee UniversityYongin17104Republic of Korea
| | - Won Gu Lee
- Department of Mechanical EngineeringCollege of EngineeringKyung Hee UniversityYongin17104Republic of Korea
| |
Collapse
|
3
|
Om Kumar CU, Gajendran S, Balaji V, Nhaveen A, Sai Balakrishnan S. Securing health care data through blockchain enabled collaborative machine learning. Soft comput 2023; 27:9941-9954. [PMID: 37287568 PMCID: PMC10204011 DOI: 10.1007/s00500-023-08330-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2023] [Indexed: 06/09/2023]
Abstract
Transferring of data in machine learning from one party to another party is one of the issues that has been in existence since the development of technology. Health care data collection using machine learning techniques can lead to privacy issues which cause disturbances among the parties and reduces the possibility to work with either of the parties. Since centralized way of information transfer between two parties can be limited and risky as they are connected using machine learning, this factor motivated us to use the decentralized way where there is no connection but model transfer between both parties will be in process through a federated way. The purpose of this research is to investigate a model transfer between a user and the client(s) in an organization using federated learning techniques and reward the client(s) for their efforts with tokens accordingly using blockchain technology. In this research, the user shares a model to organizations that are willing to volunteer their service to provide help to the user. The model is trained and transferred among the user and the clients in the organizations in a privacy preserving way. In this research, we found that the process of model transfer between user and the volunteered organizations works completely fine with the help of federated learning techniques and the client(s) is/are rewarded with tokens for their efforts. We used the COVID-19 dataset to test the federation process, which yielded individual results of 88% for contributor a, 85% for contributor b, and 74% for contributor c. When using the FedAvg algorithm, we were able to achieve a total accuracy of 82%.
Collapse
Affiliation(s)
- C. U. Om Kumar
- School of Computer Science and Engineering, Vellore Institute of Technology, Chennai Campus, Chennai, India
| | - Sudhakaran Gajendran
- School of Electronics Engineering, Vellore Institute of Technology, Chennai Campus, Chennai, India
| | - V. Balaji
- Department of Computer Science and Engineering, SRM Easwari Engineering College, Chennai, Tamil Nadu India
| | - A. Nhaveen
- Department of Computer Science and Engineering, SRM Easwari Engineering College, Chennai, Tamil Nadu India
| | - S. Sai Balakrishnan
- Department of Computer Science and Engineering, SRM Easwari Engineering College, Chennai, Tamil Nadu India
| |
Collapse
|
4
|
Falda M, Atzori M, Corbetta M. Semantic wikis as flexible database interfaces for biomedical applications. Sci Rep 2023; 13:1095. [PMID: 36658254 PMCID: PMC9851594 DOI: 10.1038/s41598-023-27743-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 01/06/2023] [Indexed: 01/20/2023] Open
Abstract
Several challenges prevent extracting knowledge from biomedical resources, including data heterogeneity and the difficulty to obtain and collaborate on data and annotations by medical doctors. Therefore, flexibility in their representation and interconnection is required; it is also essential to be able to interact easily with such data. In recent years, semantic tools have been developed: semantic wikis are collections of wiki pages that can be annotated with properties and so combine flexibility and expressiveness, two desirable aspects when modeling databases, especially in the dynamic biomedical domain. However, semantics and collaborative analysis of biomedical data is still an unsolved challenge. The aim of this work is to create a tool for easing the design and the setup of semantic databases and to give the possibility to enrich them with biostatistical applications. As a side effect, this will also make them reproducible, fostering their application by other research groups. A command-line software has been developed for creating all structures required by Semantic MediaWiki. Besides, a way to expose statistical analyses as R Shiny applications in the interface is provided, along with a facility to export Prolog predicates for reasoning with external tools. The developed software allowed to create a set of biomedical databases for the Neuroscience Department of the University of Padova in a more automated way. They can be extended with additional qualitative and statistical analyses of data, including for instance regressions, geographical distribution of diseases, and clustering. The software is released as open source-code and published under the GPL-3 license at https://github.com/mfalda/tsv2swm .
Collapse
Affiliation(s)
- Marco Falda
- Neuroscience Department, University of Padova, Padova, Italy.
| | - Manfredo Atzori
- Neuroscience Department, University of Padova, Padova, Italy
- Institute of Information Systems, University of Applied Sciences Western Switzerland (HES-SO Valais), Sierre, Switzerland
- Padova Neuroscience Center (PNC), Clinica Neurologica, and Venetian Institute of Molecular Medicine, VIMM, Padova, Italy
| | - Maurizio Corbetta
- Neuroscience Department, University of Padova, Padova, Italy
- Padova Neuroscience Center (PNC), Clinica Neurologica, and Venetian Institute of Molecular Medicine, VIMM, Padova, Italy
- Department of Neurology, Radiology, Neuroscience Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
5
|
Combining Molecular, Imaging, and Clinical Data Analysis for Predicting Cancer Prognosis. Cancers (Basel) 2022; 14:cancers14133215. [PMID: 35804988 PMCID: PMC9265023 DOI: 10.3390/cancers14133215] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/27/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary The rise of Big Data, the widespread use of Machine Learning, and the cheapening of omics techniques have allowed for the creation of more sophisticated and accurate models in biomedical research. This article presents the state-of-the-art predictive models of cancer prognosis that use multimodal data, considering clinical, molecular (omics and non-omics), and image data. The subject of study, the data modalities used, the data processing and modelling methods applied, the validation strategies involved, the integration strategies encompassed, and the evolution of prognostic predictive models are discussed. Finally, we discuss challenges and opportunities in this field of cancer research, with great potential impact on the clinical management of patients and, by extension, on the implementation of personalised and precision medicine. Abstract Cancer is one of the most detrimental diseases globally. Accordingly, the prognosis prediction of cancer patients has become a field of interest. In this review, we have gathered 43 state-of-the-art scientific papers published in the last 6 years that built cancer prognosis predictive models using multimodal data. We have defined the multimodality of data as four main types: clinical, anatomopathological, molecular, and medical imaging; and we have expanded on the information that each modality provides. The 43 studies were divided into three categories based on the modelling approach taken, and their characteristics were further discussed together with current issues and future trends. Research in this area has evolved from survival analysis through statistical modelling using mainly clinical and anatomopathological data to the prediction of cancer prognosis through a multi-faceted data-driven approach by the integration of complex, multimodal, and high-dimensional data containing multi-omics and medical imaging information and by applying Machine Learning and, more recently, Deep Learning techniques. This review concludes that cancer prognosis predictive multimodal models are capable of better stratifying patients, which can improve clinical management and contribute to the implementation of personalised medicine as well as provide new and valuable knowledge on cancer biology and its progression.
Collapse
|
6
|
Privacy-Preserving Feature Selection with Fully Homomorphic Encryption. ALGORITHMS 2022. [DOI: 10.3390/a15070229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/10/2022]
Abstract
For the feature selection problem, we propose an efficient privacy-preserving algorithm. Let D, F, and C be data, feature, and class sets, respectively, where the feature value x(Fi) and the class label x(C) are given for each x∈D and Fi∈F. For a triple (D,F,C), the feature selection problem is to find a consistent and minimal subset F′⊆F, where ‘consistent’ means that, for any x,y∈D, x(C)=y(C) if x(Fi)=y(Fi) for Fi∈F′, and ‘minimal’ means that any proper subset of F′ is no longer consistent. On distributed datasets, we consider feature selection as a privacy-preserving problem: assume that semi-honest parties A and B have their own personal DA and DB. The goal is to solve the feature selection problem for DA∪DB without sacrificing their privacy. In this paper, we propose a secure and efficient algorithm based on fully homomorphic encryption, and we implement our algorithm to show its effectiveness for various practical data. The proposed algorithm is the first one that can directly simulate the CWC (Combination of Weakest Components) algorithm on ciphertext, which is one of the best performers for the feature selection problem on the plaintext.
Collapse
|
7
|
Munjal K, Bhatia R. A systematic review of homomorphic encryption and its contributions in healthcare industry. COMPLEX INTELL SYST 2022; 9:1-28. [PMID: 35531323 PMCID: PMC9062639 DOI: 10.1007/s40747-022-00756-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Accepted: 04/15/2022] [Indexed: 11/26/2022]
Abstract
Cloud computing and cloud storage have contributed to a big shift in data processing and its use. Availability and accessibility of resources with the reduction of substantial work is one of the main reasons for the cloud revolution. With this cloud computing revolution, outsourcing applications are in great demand. The client uses the service by uploading their data to the cloud and finally gets the result by processing it. It benefits users greatly, but it also exposes sensitive data to third-party service providers. In the healthcare industry, patient health records are digital records of a patient's medical history kept by hospitals or health care providers. Patient health records are stored in data centers for storage and processing. Before doing computations on data, traditional encryption techniques decrypt the data in their original form. As a result, sensitive medical information is lost. Homomorphic encryption can protect sensitive information by allowing data to be processed in an encrypted form such that only encrypted data is accessible to service providers. In this paper, an attempt is made to present a systematic review of homomorphic cryptosystems with its categorization and evolution over time. In addition, this paper also includes a review of homomorphic cryptosystem contributions in healthcare.
Collapse
Affiliation(s)
- Kundan Munjal
- Department of Computer Science and Engineering, Punjabi University, Patiala, 147002 India
- AIT CSE, Chandigarh University, Gharuan, Mohali 140413 India
| | - Rekha Bhatia
- Department of Computer Science, Punjabi University, Patiala, 147002 India
| |
Collapse
|
8
|
Privacy-preserving genotype imputation with fully homomorphic encryption. Cell Syst 2022; 13:173-182.e3. [PMID: 34758288 PMCID: PMC8857019 DOI: 10.1016/j.cels.2021.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 06/28/2021] [Accepted: 10/15/2021] [Indexed: 12/17/2022]
Abstract
Genotype imputation is the inference of unknown genotypes using known population structure observed in large genomic datasets; it can further our understanding of phenotype-genotype relationships and is useful for QTL mapping and GWASs. However, the compute-intensive nature of genotype imputation can overwhelm local servers for computation and storage. Hence, many researchers are moving toward using cloud services, raising privacy concerns. We address these concerns by developing an efficient, privacy-preserving algorithm called p-Impute. Our method uses homomorphic encryption, allowing calculations on ciphertext, thereby avoiding the decryption of private genotypes in the cloud. It is similar to k-nearest neighbor approaches, inferring missing genotypes in a genomic block based on the SNP genotypes of genetically related individuals in the same block. Our results demonstrate accuracy in agreement with the state-of-the-art plaintext solutions. Moreover, p-Impute is scalable to real-world applications as its memory and time requirements increase linearly with the increasing number of samples. p-Impute is freely available for download here: https://doi.org/10.5281/zenodo.5542001.
Collapse
|
9
|
Abstract
Cryptography is traditionally considered as a main information security mechanism, providing several security services such as confidentiality, as well as data and entity authentication. This aspect is clearly relevant to the fundamental human right of privacy, in terms of securing data from eavesdropping and tampering, as well as from masquerading their origin. However, cryptography may also support several other (legal) requirements related to privacy. For example, in order to fulfil the data minimisation principle—i.e., to ensure that the personal data that are being processed are adequate and limited only to what is necessary in relation to the purposes for which they are processed—the use of advanced cryptographic techniques such as secure computations, zero-knowledge proofs or homomorphic encryption may be prerequisite. In practice though, it seems that the organisations performing personal data processing are not fully aware of such solutions, thus adopting techniques that pose risks for the rights of individuals. This paper aims to provide a generic overview of the possible cryptographic applications that suffice to address privacy challenges. In the process, we shall also state our view on the public “debate” on finding ways so as to allow law enforcement agencies to bypass the encryption of communication.
Collapse
|
10
|
Privacy preservation of genome data analysis using homomorphic encryption. SERVICE ORIENTED COMPUTING AND APPLICATIONS 2021. [DOI: 10.1007/s11761-021-00326-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
11
|
Multi-Party Privacy-Preserving Logistic Regression with Poor Quality Data Filtering for IoT Contributors. ELECTRONICS 2021. [DOI: 10.3390/electronics10172049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Nowadays, the internet of things (IoT) is used to generate data in several application domains. A logistic regression, which is a standard machine learning algorithm with a wide application range, is built on such data. Nevertheless, building a powerful and effective logistic regression model requires large amounts of data. Thus, collaboration between multiple IoT participants has often been the go-to approach. However, privacy concerns and poor data quality are two challenges that threaten the success of such a setting. Several studies have proposed different methods to address the privacy concern but to the best of our knowledge, little attention has been paid towards addressing the poor data quality problems in the multi-party logistic regression model. Thus, in this study, we propose a multi-party privacy-preserving logistic regression framework with poor quality data filtering for IoT data contributors to address both problems. Specifically, we propose a new metric gradient similarity in a distributed setting that we employ to filter out parameters from data contributors with poor quality data. To solve the privacy challenge, we employ homomorphic encryption. Theoretical analysis and experimental evaluations using real-world datasets demonstrate that our proposed framework is privacy-preserving and robust against poor quality data.
Collapse
|
12
|
Sarkar E, Chielle E, Gürsoy G, Mazonka O, Gerstein M, Maniatakos M. Fast and Scalable Private Genotype Imputation Using Machine Learning and Partially Homomorphic Encryption. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2021; 9:93097-93110. [PMID: 34476144 PMCID: PMC8409799 DOI: 10.1109/access.2021.3093005] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The recent advances in genome sequencing technologies provide unprecedented opportunities to understand the relationship between human genetic variation and diseases. However, genotyping whole genomes from a large cohort of individuals is still cost prohibitive. Imputation methods to predict genotypes of missing genetic variants are widely used, especially for genome-wide association studies. Accurate genotype imputation requires complex statistical methods. Due to the data and computing-intensive nature of the problem, imputation is increasingly outsourced, raising serious privacy concerns. In this work, we investigate solutions for fast, scalable, and accurate privacy-preserving genotype imputation using Machine Learning (ML) and a standardized homomorphic encryption scheme, Paillier cryptosystem. ML-based privacy-preserving inference has been largely optimized for computation-heavy non-linear functions in a single-output multi-class classification setting. However, having a large number of multi-class outputs per genome per individual calls for further optimizations and/or approximations specific to this application. Here we explore the effectiveness of linear models for genotype imputation to convert them to privacy-preserving equivalents using standardized homomorphic encryption schemes. Our results show that performance of our privacy-preserving genotype imputation method is equivalent to the state-of-the-art plaintext solutions, achieving up to 99% micro area under curve score, even on real-world large-scale datasets up to 80,000 targets.
Collapse
Affiliation(s)
- Esha Sarkar
- Tandon School of Engineering, New York University, New York, NY 11201, USA
| | - Eduardo Chielle
- New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Gamze Gürsoy
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Oleg Mazonka
- New York University Abu Dhabi, Abu Dhabi, United Arab Emirates
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Michail Maniatakos
- Tandon School of Engineering, New York University, New York, NY 11201, USA
| |
Collapse
|
13
|
Yang Y, Li D. Medical Data Feature Learning Based on Probability and Depth Learning Mining: Model Development and Validation. JMIR Med Inform 2021; 9:e19055. [PMID: 33830067 PMCID: PMC8063096 DOI: 10.2196/19055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 05/08/2020] [Accepted: 05/08/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Big data technology provides unlimited potential for efficient storage, processing, querying, and analysis of medical data. Technologies such as deep learning and machine learning simulate human thinking, assist physicians in diagnosis and treatment, provide personalized health care services, and promote the use of intelligent processes in health care applications. OBJECTIVE The aim of this paper was to analyze health care data and develop an intelligent application to predict the number of hospital outpatient visits for mass health impact and analyze the characteristics of health care big data. Designing a corresponding data feature learning model will help patients receive more effective treatment and will enable rational use of medical resources. METHODS A cascaded depth model was successfully implemented by constructing a cascaded depth learning framework and by studying and analyzing the specific feature transformation, feature selection, and classifier algorithm used in the framework. To develop a medical data feature learning model based on probabilistic and deep learning mining, we mined information from medical big data and developed an intelligent application that studies the differences in medical data for disease risk assessment and enables feature learning of the related multimodal data. Thus, we propose a cascaded data feature learning model. RESULTS The depth model created in this paper is more suitable for forecasting daily outpatient volumes than weekly or monthly volumes. We believe that there are two reasons for this: on the one hand, the training data set in the daily outpatient volume forecast model is larger, so the training parameters of the model more closely fit the actual data relationship. On the other hand, the weekly and monthly outpatient volume is the cumulative daily outpatient volume; therefore, errors caused by the prediction will gradually accumulate, and the greater the interval, the lower the prediction accuracy. CONCLUSIONS Several data feature learning models are proposed to extract the relationships between outpatient volume data and obtain the precise predictive value of the outpatient volume, which is very helpful for the rational allocation of medical resources and the promotion of intelligent medical treatment.
Collapse
Affiliation(s)
- Yuanlin Yang
- Department of Logistics Management, West China Second University Hospital, Sichuan University, Chengdu, China.,Key Laboratory of Obstetric and Gynecologic and Pediatric Disease and Birth Defects of Ministry of Education, Sichuan University, Chengdu, China
| | - Dehua Li
- Key Laboratory of Obstetric and Gynecologic and Pediatric Disease and Birth Defects of Ministry of Education, Sichuan University, Chengdu, China.,Quality Assessment Office, Nursing Department, West China Second University Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
14
|
Enthoven D, Al-Ars Z. An Overview of Federated Deep Learning Privacy Attacks and Defensive Strategies. FEDERATED LEARNING SYSTEMS 2021. [DOI: 10.1007/978-3-030-70604-3_8] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
15
|
Thapa C, Camtepe S. Precision health data: Requirements, challenges and existing techniques for data security and privacy. Comput Biol Med 2020; 129:104130. [PMID: 33271399 DOI: 10.1016/j.compbiomed.2020.104130] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/11/2020] [Accepted: 11/11/2020] [Indexed: 01/21/2023]
Abstract
Precision health leverages information from various sources, including omics, lifestyle, environment, social media, medical records, and medical insurance claims to enable personalized care, prevent and predict illness, and precise treatments. It extensively uses sensing technologies (e.g., electronic health monitoring devices), computations (e.g., machine learning), and communication (e.g., interaction between the health data centers). As health data contain sensitive private information, including the identity of patient and carer and medical conditions of the patient, proper care is required at all times. Leakage of these private information affects the personal life, including bullying, high insurance premium, and loss of job due to the medical history. Thus, the security, privacy of and trust on the information are of utmost importance. Moreover, government legislation and ethics committees demand the security and privacy of healthcare data. Besides, the public, who is the data source, always expects the security, privacy, and trust of their data. Otherwise, they can avoid contributing their data to the precision health system. Consequently, as the public is the targeted beneficiary of the system, the effectiveness of precision health diminishes. Herein, in the light of precision health data security, privacy, ethical and regulatory requirements, finding the best methods and techniques for the utilization of the health data, and thus precision health is essential. In this regard, firstly, this paper explores the regulations, ethical guidelines around the world, and domain-specific needs. Then it presents the requirements and investigates the associated challenges. Secondly, this paper investigates secure and privacy-preserving machine learning methods suitable for the computation of precision health data along with their usage in relevant health projects. Finally, it illustrates the best available techniques for precision health data security and privacy with a conceptual system model that enables compliance, ethics clearance, consent management, medical innovations, and developments in the health domain.
Collapse
|
16
|
Comess S, Akbay A, Vasiliou M, Hines RN, Joppa L, Vasiliou V, Kleinstreuer N. Bringing Big Data to Bear in Environmental Public Health: Challenges and Recommendations. Front Artif Intell 2020; 3. [PMID: 33184612 PMCID: PMC7654840 DOI: 10.3389/frai.2020.00031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
Understanding the role that the environment plays in influencing public health often involves collecting and studying large, complex data sets. There have been a number of private and public efforts to gather sufficient information and confront significant unknowns in the field of environmental public health, yet there is a persistent and largely unmet need for findable, accessible, interoperable, and reusable (FAIR) data. Even when data are readily available, the ability to create, analyze, and draw conclusions from these data using emerging computational tools, such as augmented and artificial inteligence (AI) and machine learning, requires technical skills not currently implemented on a programmatic level across research hubs and academic institutions. We argue that collaborative efforts in data curation and storage, scientific computing, and training are of paramount importance to empower researchers within environmental sciences and the broader public health community to apply AI approaches and fully realize their potential. Leaders in the field were asked to prioritize challenges in incorporating big data in environmental public health research: inconsistent implementation of FAIR principles in data collection and sharing, a lack of skilled data scientists and appropriate cyber-infrastructures, and limited understanding of possibilities and communication of benefits were among those identified. These issues are discussed, and actionable recommendations are provided.
Collapse
Affiliation(s)
- Saskia Comess
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States.,Department of Statistics and Data Science, Yale University, New Haven, CT, United States
| | - Alexia Akbay
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States.,Symbrosia Inc, Kailua-Kona, HI, United States
| | - Melpomene Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States
| | - Ronald N Hines
- US Environmental Protection Agency, Center for Public Health and Environmental Assessment, Research Triangle Park, NC, United States
| | - Lucas Joppa
- Microsoft Corporation, AI for Earth, Redmond, WA, United States
| | - Vasilis Vasiliou
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States
| | - Nicole Kleinstreuer
- Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, United States.,National Toxicology Program Interagency Center for the Evaluation of Alternative Toxicological Methods, National Institute of Environmental Health Sciences, Research Triangle Park, NC, United States
| |
Collapse
|
17
|
Zhu D, Zhu H, Liu X, Li H, Wang F, Li H, Feng D. CREDO: Efficient and privacy-preserving multi-level medical pre-diagnosis based on ML-kNN. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2019.11.041] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
18
|
Li D, Liao X, Xiang T, Wu J, Le J. Privacy-preserving self-serviced medical diagnosis scheme based on secure multi-party computation. Comput Secur 2020. [DOI: 10.1016/j.cose.2019.101701] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
19
|
Soni H, Grando A, Murcko A, Diaz S, Mukundan M, Idouraine N, Karway G, Todd M, Chern D, Dye C, Whitfield MJ. State of the art and a mixed-method personalized approach to assess patient perceptions on medical record sharing and sensitivity. J Biomed Inform 2020; 101:103338. [PMID: 31726102 PMCID: PMC6952579 DOI: 10.1016/j.jbi.2019.103338] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2019] [Revised: 11/07/2019] [Accepted: 11/09/2019] [Indexed: 10/25/2022]
Abstract
OBJECTIVE Sensitive health information possesses risks, such as stigma and discrimination, when disclosed. Few studies have used a patient's own electronic health records (EHRs) to explore what types of information are considered sensitive andhow such perceptions affect data sharing preferences. After a systematic literature review, we designed and piloted a mixed-method approach that employs an individual's own records to assess content sensitivity and preferences for granular data sharing for care and research. METHODS A systematic literature review of methodologies employed to assess data sharing willingness and perceptions on data sensitivity was conducted. A methodology was designed to organize and categorize sensitive health information from EHRs. Patients were asked permission to access their EHRs, including those available through the state's health information exchange. A semi-structured interview script with closed card sorting was designed and personalized to each participant's own EHRs using 30 items from each patient record. This mixed method combines the quantitative outcomes from the card sorting exercises with themes captured from interview audio recording analysis. RESULTS Eight publications on patients' perspectives on data sharing and sensitivity were found. Based on our systematic review, the proposed method meets a need to use EHRs to systematize the study of data privacy issues. Twenty-five patients with behavioral health conditions, English and Spanish-speaking, were recruited. On average, participants recognized 82.7% of the 30 items from their own EHRs. Participants considered mental health (76.0%), sexual and reproductive health (75.0%) and alcohol use and alcoholism (50.0%) sensitive information. Participants were willing to share information related to other addictions (100.0%), genetic data (95.8%) and general physical health information (90.5%). CONCLUSION The findings indicate diversity in patient views on EHR sensitivity and data sharing preferences and the need for more granular and patient-centered electronic consent mechanisms to accommodate patient needs. More research is needed to validate the generalizability of the proposed methodology.
Collapse
Affiliation(s)
- Hiral Soni
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Adela Grando
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States.
| | - Anita Murcko
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Sabrina Diaz
- Kinesiology, College of Health Solutions, Arizona State University, Phoenix, United States
| | - Madhumita Mukundan
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Nassim Idouraine
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - George Karway
- Biomedical Informatics, College of Health Solutions, Arizona State University, Scottsdale, United States
| | - Michael Todd
- College of Nursing and Health Innovation, Arizona State University, Phoenix, United States
| | | | - Christy Dye
- Partners in Recovery, Phoenix, United States
| | | |
Collapse
|
20
|
Li T, Li X, Zhong X, Jiang N, Gao CZ. Communication-efficient outsourced privacy-preserving classification service using trusted processor. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2019.07.047] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
21
|
A systematic review on the status and progress of homomorphic encryption technologies. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2019. [DOI: 10.1016/j.jisa.2019.102362] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
22
|
Aziz MMA, Sadat MN, Alhadidi D, Wang S, Jiang X, Brown CL, Mohammed N. Privacy-preserving techniques of genomic data-a survey. Brief Bioinform 2019; 20:887-895. [PMID: 29121240 PMCID: PMC6585383 DOI: 10.1093/bib/bbx139] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 09/30/2017] [Indexed: 01/10/2023] Open
Abstract
Genomic data hold salient information about the characteristics of a living organism. Throughout the past decade, pinnacle developments have given us more accurate and inexpensive methods to retrieve genome sequences of humans. However, with the advancement of genomic research, there is a growing privacy concern regarding the collection, storage and analysis of such sensitive human data. Recent results show that given some background information, it is possible for an adversary to reidentify an individual from a specific genomic data set. This can reveal the current association or future susceptibility of some diseases for that individual (and sometimes the kinship between individuals) resulting in a privacy violation. Regardless of these risks, our genomic data hold much importance in analyzing the well-being of us and the future generation. Thus, in this article, we discuss the different privacy and security-related problems revolving around human genomic data. In addition, we will explore some of the cardinal cryptographic concepts, which can bring efficacy in secure and private genomic data computation. This article will relate the gaps between these two research areas-Cryptography and Genomics.
Collapse
Affiliation(s)
- Md Momin Al Aziz
- Department of Computer Science at the University of Manitoba, Winnipeg, Canada
| | - Md Nazmus Sadat
- Department of Computer Science at the University of Manitoba, Winnipeg, Canada
| | - Dima Alhadidi
- Faculty of Computer Science at the University of New Brunswick, Frederiction, Canada
| | - Shuang Wang
- Department of Biomedical Informatics at the University of California in San Diego, La Jolla, CA, USA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics at the University of California in San Diego, La Jolla, CA, USA
| | - Cheryl L Brown
- Department of Political Science and Public Administration at the University of North Carolina at Charlotte, NC, USA
| | - Noman Mohammed
- Department of Computer Science at the University of Manitoba, Winnipeg, Canada
| |
Collapse
|
23
|
Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective. PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES 2018. [DOI: 10.2478/popets-2019-0006] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Abstract
Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy experts.
Collapse
|
24
|
Abstract
Background One of the 3 tracks of iDASH Privacy & Security Workshop 2017 competition was to execute a whole genome variants search on private genomic data. Particularly, the search application was to find the top most significant SNPs (Single-Nucleotide Polymorphisms) in a database of genome records labeled with control or case. In this paper we discuss the solution submitted by our team to this competition. Methods Privacy and confidentiality of genome data had to be ensured using Intel SGX enclaves. The typical use-case of this application is the multi-party computation (each party possessing one or several genome records) of the SNPs which statistically differentiate control and case genome datasets. Results Our solution consists of two applications: (i) compress and encrypt genome files and (ii) perform genome processing (top most important SNPs search). We have opted for a horizontal treatment of genome records and heavily used parallel processing. Rust programming language was employed to develop both applications. Conclusions Execution performance of the processing applications scales well and very good performance metrics are obtained. Contest organizers selected it as the best submission amongst other received competition entries and our team was awarded the first prize on this track.
Collapse
|
25
|
Abstract
BACKGROUND Logistic regression is a popular technique used in machine learning to construct classification models. Since the construction of such models is based on computing with large datasets, it is an appealing idea to outsource this computation to a cloud service. The privacy-sensitive nature of the input data requires appropriate privacy preserving measures before outsourcing it. Homomorphic encryption enables one to compute on encrypted data directly, without decryption and can be used to mitigate the privacy concerns raised by using a cloud service. METHODS In this paper, we propose an algorithm (and its implementation) to train a logistic regression model on a homomorphically encrypted dataset. The core of our algorithm consists of a new iterative method that can be seen as a simplified form of the fixed Hessian method, but with a much lower multiplicative complexity. RESULTS We test the new method on two interesting real life applications: the first application is in medicine and constructs a model to predict the probability for a patient to have cancer, given genomic data as input; the second application is in finance and the model predicts the probability of a credit card transaction to be fraudulent. The method produces accurate results for both applications, comparable to running standard algorithms on plaintext data. CONCLUSIONS This article introduces a new simple iterative algorithm to train a logistic regression model that is tailored to be applied on a homomorphically encrypted dataset. This algorithm can be used as a privacy-preserving technique to build a binary classification model and can be applied in a wide range of problems that can be modelled with logistic regression. Our implementation results show that our method can handle the large datasets used in logistic regression training.
Collapse
Affiliation(s)
- Charlotte Bonte
- imec-Cosic, Dept. Electrical Engineering, KU Leuven, Kasteelpark Arenberg 10, Leuven, Belgium
| | - Frederik Vercauteren
- imec-Cosic, Dept. Electrical Engineering, KU Leuven, Kasteelpark Arenberg 10, Leuven, Belgium
| |
Collapse
|
26
|
Arellano AM, Dai W, Wang S, Jiang X, Ohno-Machado L. Privacy Policy and Technology in Biomedical Data Science. Annu Rev Biomed Data Sci 2018; 1:115-129. [PMID: 31058261 PMCID: PMC6497413 DOI: 10.1146/annurev-biodatasci-080917-013416] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Privacyis an important consideration when sharing clinical data, which often contain sensitive information. Adequate protection to safeguard patient privacy and to increase public trust in biomedical research is paramount. This review covers topics in policy and technology in the context of clinical data sharing. We review policy articles related to (a) the Common Rule, HIPAA privacy and security rules, and governance; (b) patients' viewpoints and consent practices; and (c) research ethics. We identify key features of the revised Common Rule and the most notable changes since its previous version. We address data governance for research in addition to the increasing emphasis on ethical and social implications. Research ethics topics include data sharing best practices, use of data from populations of low socioeconomic status (SES), recent updates to institutional review board (IRB) processes to protect human subjects' data, and important concerns about the limitations of current policies to address data deidentification. In terms of technology, we focus on articles that have applicability in real world health care applications: deidentification methods that comply with HIPAA, data anonymization approaches to satisfy well-acknowledged issues in deidentified data, encryption methods to safeguard data analyses, and privacy-preserving predictive modeling. The first two technology topics are mostly relevant to methodologies that attempt to sanitize structured or unstructured data. The third topic includes analysis on encrypted data. The last topic includes various mechanisms to build statistical models without sharing raw data.
Collapse
Affiliation(s)
- April Moreno Arellano
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Wenrui Dai
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Shuang Wang
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, School of Medicine, University of California, San Diego, La Jolla, California 92093, USA;
| |
Collapse
|
27
|
Kim M, Song Y, Wang S, Xia Y, Jiang X. Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation. JMIR Med Inform 2018; 6:e19. [PMID: 29666041 PMCID: PMC5930176 DOI: 10.2196/medinform.8805] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 12/21/2017] [Accepted: 01/10/2018] [Indexed: 01/16/2023] Open
Abstract
Background Learning a model without accessing raw data has been an intriguing idea to security and machine learning researchers for years. In an ideal setting, we want to encrypt sensitive data to store them on a commercial cloud and run certain analyses without ever decrypting the data to preserve privacy. Homomorphic encryption technique is a promising candidate for secure data outsourcing, but it is a very challenging task to support real-world machine learning tasks. Existing frameworks can only handle simplified cases with low-degree polynomials such as linear means classifier and linear discriminative analysis. Objective The goal of this study is to provide a practical support to the mainstream learning models (eg, logistic regression). Methods We adapted a novel homomorphic encryption scheme optimized for real numbers computation. We devised (1) the least squares approximation of the logistic function for accuracy and efficiency (ie, reduce computation cost) and (2) new packing and parallelization techniques. Results Using real-world datasets, we evaluated the performance of our model and demonstrated its feasibility in speed and memory consumption. For example, it took approximately 116 minutes to obtain the training model from the homomorphically encrypted Edinburgh dataset. In addition, it gives fairly accurate predictions on the testing dataset. Conclusions We present the first homomorphically encrypted logistic regression outsourcing model based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still.
Collapse
Affiliation(s)
- Miran Kim
- Division of Biomedical Informatics, University of California, San Diego, San Diego, CA, United States
| | - Yongsoo Song
- Department of Mathematical Sciences, Seoul National University, Seoul, Republic Of Korea.,Department of Computer Science and Engineering, University of California, San Diego, San Diego, CA, United States
| | - Shuang Wang
- Division of Biomedical Informatics, University of California, San Diego, San Diego, CA, United States
| | - Yuhou Xia
- Department of Mathematics, Princeton University, Princeton, NJ, United States
| | - Xiaoqian Jiang
- Division of Biomedical Informatics, University of California, San Diego, San Diego, CA, United States
| |
Collapse
|
28
|
Privacy-Preserving Multiparty Learning for Logistic Regression. LECTURE NOTES OF THE INSTITUTE FOR COMPUTER SCIENCES, SOCIAL INFORMATICS AND TELECOMMUNICATIONS ENGINEERING 2018. [DOI: 10.1007/978-3-030-01701-9_30] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
29
|
Wang S, Jiang X, Tang H, Wang X, Bu D, Carey K, Dyke SO, Fox D, Jiang C, Lauter K, Malin B, Sofia H, Telenti A, Wang L, Wang W, Ohno-Machado L. A community effort to protect genomic data sharing, collaboration and outsourcing. NPJ Genom Med 2017; 2:33. [PMID: 29263842 PMCID: PMC5677972 DOI: 10.1038/s41525-017-0036-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 07/10/2017] [Accepted: 10/10/2017] [Indexed: 12/13/2022] Open
Abstract
The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and scholars in ethical, legal, and social implications (ELSI) to assess the latest advances on privacy-preserving techniques for protecting human genomic data. Teams were asked to develop novel protection methods for emerging genome privacy challenges in three scenarios: Track (1) data sharing through the Beacon service of the Global Alliance for Genomics and Health. Track (2) collaborative discovery of similar genomes between two institutions; and Track (3) data outsourcing to public cloud services. The latter two tracks represent continuing themes from our 2015 competition, while the former was new and a response to a recently established vulnerability. The winning strategy for Track 1 mitigated the privacy risk by hiding approximately 11% of the variation in the database while permitting around 160,000 queries, a significant improvement over the baseline. The winning strategies in Tracks 2 and 3 showed significant progress over the previous competition by achieving multiple orders of magnitude performance improvement in terms of computational runtime and memory requirements. The outcomes suggest that applying highly optimized privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis is useful. However, the results also indicate that further efforts are needed to refine these techniques into practical solutions.
Collapse
Affiliation(s)
- Shuang Wang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| | - Xiaoqian Jiang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| | - Haixu Tang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Xiaofeng Wang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Diyue Bu
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Knox Carey
- GeneCloud, Intertrust, CA, Sunnyvale, CA 94085 USA
| | - Stephanie Om Dyke
- Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC H3A 0G4 Canada
| | - Dov Fox
- School of Law, University of San Diego, San Diego, CA 92110 USA
| | - Chao Jiang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| | - Kristin Lauter
- Cryptography Group, Microsoft Research, San Diego, CA 92122 USA
| | - Bradley Malin
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37203 USA
| | - Heidi Sofia
- National Human Genome Research Institute, Rockville, MD 20894 USA
| | | | - Lei Wang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Wenhao Wang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| |
Collapse
|
30
|
Chen F, Wang C, Dai W, Jiang X, Mohammed N, Al Aziz MM, Sadat MN, Sahinalp C, Lauter K, Wang S. PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre Guard Extension. BMC Med Genomics 2017; 10:48. [PMID: 28786365 PMCID: PMC5547453 DOI: 10.1186/s12920-017-0281-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background Advances in DNA sequencing technologies have prompted a wide range of genomic applications to improve healthcare and facilitate biomedical research. However, privacy and security concerns have emerged as a challenge for utilizing cloud computing to handle sensitive genomic data. Methods We present one of the first implementations of Software Guard Extension (SGX) based securely outsourced genetic testing framework, which leverages multiple cryptographic protocols and minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing. Results We compared the performance of the proposed PRESAGE framework with the state-of-the-art homomorphic encryption scheme, as well as the plaintext implementation. The experimental results demonstrated significant performance over the homomorphic encryption methods and a small computational overhead in comparison to plaintext implementation. Conclusions The proposed PRESAGE provides an alternative solution for secure and efficient genomic data outsourcing in an untrusted cloud by using a hybrid framework that combines secure hardware and multiple crypto protocols.
Collapse
Affiliation(s)
- Feng Chen
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA.
| | - Chenghong Wang
- Department of Computer Science, Syracuse University, Syracuse, 13244, NY, USA
| | - Wenrui Dai
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA
| | - Noman Mohammed
- Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, MB, Canada
| | - Md Momin Al Aziz
- Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, MB, Canada
| | - Md Nazmus Sadat
- Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, MB, Canada
| | - Cenk Sahinalp
- Department of Computer Science and Informatics, Indiana University, Bloomington, 47408, IN, USA
| | - Kristin Lauter
- Cryptography Group, Microsoft Research, San Diego,, 92122, CA, USA
| | - Shuang Wang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA
| |
Collapse
|
31
|
Abstract
BACKGROUND One of the tasks in the iDASH Secure Genome Analysis Competition in 2016 was to demonstrate the feasibility of privacy-preserving queries on homomorphically encrypted genomic data. More precisely, given a list of up to 100,000 mutations, the task was to encrypt the data using homomorphic encryption in a way that allows it to be stored securely in the cloud, and enables the data owner to query the dataset for the presence of specific mutations, without revealing any information about the dataset or the queries to the cloud. METHODS We devise a novel string matching protocol to enable privacy-preserving queries on homomorphically encrypted data. Our protocol combines state-of-the-art techniques from homomorphic encryption and private set intersection protocols to minimize the computational and communication cost. RESULTS We implemented our protocol using the homomorphic encryption library SEAL v2.1, and applied it to obtain an efficient solution to the iDASH competition task. For example, using 8 threads, our protocol achieves a running time of only 4 s, and a communication cost of 2 MB, when querying for the presence of 5 mutations from an encrypted dataset of 100,000 mutations. CONCLUSIONS We demonstrate that homomorphic encryption can be used to enable an efficient privacy-preserving mechanism for querying the presence of particular mutations in realistic size datasets. Beyond its applications to genomics, our protocol can just as well be applied to any kind of data, and is therefore of independent interest to the homomorphic encryption community.
Collapse
Affiliation(s)
- Gizem S Çetin
- Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609, USA
| | - Hao Chen
- Microsoft Research, 14820 NE 36th St, Redmond, WA 98052, USA
| | - Kim Laine
- Microsoft Research, 14820 NE 36th St, Redmond, WA 98052, USA.
| | - Kristin Lauter
- Microsoft Research, 14820 NE 36th St, Redmond, WA 98052, USA
| | - Peter Rindal
- Oregon State University, 2500 NW Monroe Ave, Corvallis, OR 97331, USA
| | - Yuhou Xia
- Princeton University, 304 Washington Rd, Princeton, NJ 08544, USA
| |
Collapse
|
32
|
Zhu H, Liu X, Lu R, Li H. Efficient and Privacy-Preserving Online Medical Prediagnosis Framework Using Nonlinear SVM. IEEE J Biomed Health Inform 2017; 21:838-850. [DOI: 10.1109/jbhi.2016.2548248] [Citation(s) in RCA: 79] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
33
|
Prasser F, Kohlmayer F, Spengler H, Kuhn KA. A Scalable and Pragmatic Method for the Safe Sharing of High-Quality Health Data. IEEE J Biomed Health Inform 2017; 22:611-622. [PMID: 28358693 DOI: 10.1109/jbhi.2017.2676880] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The sharing of sensitive personal health data is an important aspect of biomedical research. Methods of data de-identification are often used in this process to trade the granularity of data off against privacy risks. However, traditional approaches, such as HIPAA safe harbor or -anonymization, often fail to provide data with sufficient quality. Alternatively, data can be de-identified only to a degree which still allows us to use it as required, e.g., to carry out specific analyses. Controlled environments, which restrict the ways recipients can interact with the data, can then be used to cope with residual risks. The contributions of this article are twofold. First, we present a method for implementing controlled data sharing environments and analyze its privacy properties. Second, we present a de-identification method which is specifically suited for sanitizing health data which is to be shared in such environments. Traditional de-identification methods control the uniqueness of records in a dataset. The basic idea of our approach is to reduce the probability that a record in a dataset has characteristics which are unique within the underlying population. As the characteristics of the population are typically not known, we have implemented a pragmatic solution in which properties of the population are modeled with statistical methods. We have further developed an accompanying process for evaluating and validating the degree of protection provided. The results of an extensive experimental evaluation show that our approach enables the safe sharing of high-quality data and that it is highly scalable.
Collapse
|
34
|
Horvitz E, White RW. Making Connections: Advancing Healthcare Research Via Consumer Mobile Devices. Circ Cardiovasc Qual Outcomes 2017; 10:CIRCOUTCOMES.117.003573. [PMID: 28325752 DOI: 10.1161/circoutcomes.117.003573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
35
|
Mata C, Oliver A, Lalande A, Walker P, Martí J. On the Use of XML in Medical Imaging Web-Based Applications. Ing Rech Biomed 2017. [DOI: 10.1016/j.irbm.2016.10.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
36
|
Khedr A, Gulak G. SecureMed: Secure Medical Computation Using GPU-Accelerated Homomorphic Encryption Scheme. IEEE J Biomed Health Inform 2017; 22:597-606. [PMID: 28129194 DOI: 10.1109/jbhi.2017.2657458] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Sharing the medical records of individuals among healthcare providers and researchers around the world can accelerate advances in medical research. While the idea seems increasingly practical due to cloud data services, maintaining patient privacy is of paramount importance. Standard encryption algorithms help protect sensitive data from outside attackers but they cannot be used to compute on this sensitive data while being encrypted. Homomorphic Encryption presents a very useful tool that can compute on encrypted data without the need to decrypt it. In this paper, we describe an optimized NTRU-based implementation of the GSW homomorphic encryption scheme. Our results show a factor of 58 × improvement in CPU performance compared to other recent work on encrypted medical data under the same security settings. Our system is built to be easily portable to GPUs resulting in an additional speedup of up to a factor of 104 × (and 410 ×) to offer an overall speedup of 6085 × (and 24011 ×) using a single GPU (or four GPUs), respectively.
Collapse
|
37
|
Ghasemi R, Al Aziz MM, Mohammed N, Dehkordi MH, Jiang X. Private and Efficient Query Processing on Outsourced Genomic Databases. IEEE J Biomed Health Inform 2016; 21:1466-1472. [PMID: 27834660 DOI: 10.1109/jbhi.2016.2625299] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Applications of genomic studies are spreading rapidly in many domains of science and technology such as healthcare, biomedical research, direct-to-consumer services, and legal and forensic. However, there are a number of obstacles that make it hard to access and process a big genomic database for these applications. First, sequencing genomic sequence is a time consuming and expensive process. Second, it requires large-scale computation and storage systems to process genomic sequences. Third, genomic databases are often owned by different organizations, and thus, not available for public usage. Cloud computing paradigm can be leveraged to facilitate the creation and sharing of big genomic databases for these applications. Genomic data owners can outsource their databases in a centralized cloud server to ease the access of their databases. However, data owners are reluctant to adopt this model, as it requires outsourcing the data to an untrusted cloud service provider that may cause data breaches. In this paper, we propose a privacy-preserving model for outsourcing genomic data to a cloud. The proposed model enables query processing while providing privacy protection of genomic databases. Privacy of the individuals is guaranteed by permuting and adding fake genomic records in the database. These techniques allow cloud to evaluate count and top-k queries securely and efficiently. Experimental results demonstrate that a count and a top-k query over 40 Single Nucleotide Polymorphisms (SNPs) in a database of 20 000 records takes around 100 and 150 s, respectively.
Collapse
|
38
|
Tang H, Jiang X, Wang X, Wang S, Sofia H, Fox D, Lauter K, Malin B, Telenti A, Xiong L, Ohno-Machado L. Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med Genomics 2016; 9:63. [PMID: 27733153 PMCID: PMC5062944 DOI: 10.1186/s12920-016-0224-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2016] [Accepted: 09/28/2016] [Indexed: 11/17/2022] Open
Abstract
The outsourcing of genomic data into public cloud computing settings raises concerns over privacy and security. Significant advancements in secure computation methods have emerged over the past several years, but such techniques need to be rigorously evaluated for their ability to support the analysis of human genomic data in an efficient and cost-effective manner. With respect to public cloud environments, there are concerns about the inadvertent exposure of human genomic data to unauthorized users. In analyses involving multiple institutions, there is additional concern about data being used beyond agreed research scope and being prcoessed in untrused computational environments, which may not satisfy institutional policies. To systematically investigate these issues, the NIH-funded National Center for Biomedical Computing iDASH (integrating Data for Analysis, 'anonymization' and SHaring) hosted the second Critical Assessment of Data Privacy and Protection competition to assess the capacity of cryptographic technologies for protecting computation over human genomes in the cloud and promoting cross-institutional collaboration. Data scientists were challenged to design and engineer practical algorithms for secure outsourcing of genome computation tasks in working software, whereby analyses are performed only on encrypted data. They were also challenged to develop approaches to enable secure collaboration on data from genomic studies generated by multiple organizations (e.g., medical centers) to jointly compute aggregate statistics without sharing individual-level records. The results of the competition indicated that secure computation techniques can enable comparative analysis of human genomes, but greater efficiency (in terms of compute time and memory utilization) are needed before they are sufficiently practical for real world environments.
Collapse
Affiliation(s)
- Haixu Tang
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA.
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Xiaofeng Wang
- School of Informatics and Computing, Indiana University, Bloomington, IN, USA
| | - Shuang Wang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Heidi Sofia
- National Human Genome Research Institute, Rockville, MD, USA
| | - Dov Fox
- School of Law, University of San Diego, San Diego, CA, USA
| | | | - Bradley Malin
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN, USA
| | | | - Li Xiong
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, USA
| | - Lucila Ohno-Machado
- Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
39
|
Bogle BM, Mehrotra S. A Moment Matching Approach for Generating Synthetic Data. BIG DATA 2016; 4:160-178. [PMID: 27642719 DOI: 10.1089/big.2016.0015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Synthetic data are becoming increasingly important mechanisms for sharing data among collaborators and with the public. Multiple methods for the generation of synthetic data have been proposed, but many have short comings with respect to maintaining the statistical properties of the original data. We propose a new method for fully synthetic data generation that leverages linear and integer mathematical programming models in order to match the moments of the original data in the synthetic data. This method has no inherent disclosure risk and does not require parametric or distributional assumptions. We demonstrate this methodology using the Framingham Heart Study. Existing synthetic data methods that use chained equations were compared with our approach. We fit Cox proportional hazards, logistic regression, and nonparametric models to synthetic data and compared with models fitted to the original data. True coverage, the proportion of synthetic data parameter confidence intervals that include the original data's parameter estimate, was 100% for parametric models when up to four moments were matched, and consistently outperformed the chained equations approach. The area under the curve and accuracy of the nonparametric models trained on synthetic data marginally differed when tested on the full original data. Models were also trained on synthetic data and a partition of original data and were tested on a held-out portion of original data. Fourth-order moment matched synthetic data outperformed others with respect to fitted parametric models but did not always outperform other methods with fitted nonparametric models. No single synthetic data method consistently outperformed others when assessing the performance of nonparametric models. The performance of fourth-order moment matched synthetic data in fitting parametric models suggests its use in these cases. Our empirical results also suggest that the performance of synthetic data generation techniques, including the moment matching approach, is less stable for use with nonparametric models. The benefits of the moment matching approach should be weighed against additional computational costs. In summary, our results demonstrate that the introduced moment matching approach may be considered as an alternative to existing synthetic data generation methods.
Collapse
Affiliation(s)
- Brittany Megan Bogle
- 1 Department of Industrial Engineering and Management Sciences, Northwestern University , Evanston, Illinois
| | - Sanjay Mehrotra
- 1 Department of Industrial Engineering and Management Sciences, Northwestern University , Evanston, Illinois
| |
Collapse
|
40
|
Gong Y, Fang Y, Guo Y. Private Data Analytics on Biomedical Sensing Data via Distributed Computation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:431-444. [PMID: 26761861 DOI: 10.1109/tcbb.2016.2515610] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Advances in biomedical sensors and mobile communication technologies have fostered the rapid growth of mobile health (mHealth) applications in the past years. Users generate a high volume of biomedical data during health monitoring, which can be used by the mHealth server for training predictive models for disease diagnosis and treatment. However, the biomedical sensing data raise serious privacy concerns because they reveal sensitive information such as health status and lifestyles of the sensed subjects. This paper proposes and experimentally studies a scheme that keeps the training samples private while enabling accurate construction of predictive models. We specifically consider logistic regression models which are widely used for predicting dichotomous outcomes in healthcare, and decompose the logistic regression problem into small subproblems over two types of distributed sensing data, i.e., horizontally partitioned data and vertically partitioned data. The subproblems are solved using individual private data, and thus mHealth users can keep their private data locally and only upload (encrypted) intermediate results to the mHealth server for model training. Experimental results based on real datasets show that our scheme is highly efficient and scalable to a large number of mHealth users.
Collapse
|
41
|
Abstract
BACKGROUND The rapid development of genome sequencing technology allows researchers to access large genome datasets. However, outsourcing the data processing o the cloud poses high risks for personal privacy. The aim of this paper is to give a practical solution for this problem using homomorphic encryption. In our approach, all the computations can be performed in an untrusted cloud without requiring the decryption key or any interaction with the data owner, which preserves the privacy of genome data. METHODS We present evaluation algorithms for secure computation of the minor allele frequencies and χ2 statistic in a genome-wide association studies setting. We also describe how to privately compute the Hamming distance and approximate Edit distance between encrypted DNA sequences. Finally, we compare performance details of using two practical homomorphic encryption schemes--the BGV scheme by Gentry, Halevi and Smart and the YASHE scheme by Bos, Lauter, Loftus and Naehrig. RESULTS The approach with the YASHE scheme analyzes data from 400 people within about 2 seconds and picks a variant associated with disease from 311 spots. For another task, using the BGV scheme, it took about 65 seconds to securely compute the approximate Edit distance for DNA sequences of size 5K and figure out the differences between them. CONCLUSIONS The performance numbers for BGV are better than YASHE when homomorphically evaluating deep circuits (like the Hamming distance algorithm or approximate Edit distance algorithm). On the other hand, it is more efficient to use the YASHE scheme for a low-degree computation, such as minor allele frequencies or χ2 test statistic in a case-control study.
Collapse
Affiliation(s)
- Miran Kim
- Department of Mathematical Sciences, GwanAkRo 1, Seoul, Korea
| | - Kristin Lauter
- Cryptography Research Group, Microsoft Research, Redmond, WA, USA
| |
Collapse
|
42
|
Lu WJ, Yamada Y, Sakuma J. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption. BMC Med Inform Decis Mak 2015; 15 Suppl 5:S1. [PMID: 26732892 PMCID: PMC4699111 DOI: 10.1186/1472-6947-15-s5-s1] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Objective Developed sequencing techniques are yielding large-scale genomic data at low cost. A genome-wide association study (GWAS) targeting genetic variations that are significantly associated with a particular disease offers great potential for medical improvement. However, subjects who volunteer their genomic data expose themselves to the risk of privacy invasion; these privacy concerns prevent efficient genomic data sharing. Our goal is to presents a cryptographic solution to this problem. Methods To maintain the privacy of subjects, we propose encryption of all genotype and phenotype data. To allow the cloud to perform meaningful computation in relation to the encrypted data, we use a fully homomorphic encryption scheme. Noting that we can evaluate typical statistics for GWAS from a frequency table, our solution evaluates frequency tables with encrypted genomic and clinical data as input. We propose to use a packing technique for efficient evaluation of these frequency tables. Results Our solution supports evaluation of the D′ measure of linkage disequilibrium, the Hardy-Weinberg Equilibrium, the χ2 test, etc. In this paper, we take χ2 test and linkage disequilibrium as examples and demonstrate how we can conduct these algorithms securely and efficiently in an outsourcing setting. We demonstrate with experimentation that secure outsourcing computation of one χ2 test with 10, 000 subjects requires about 35 ms and evaluation of one linkage disequilibrium with 10, 000 subjects requires about 80 ms. Conclusions With appropriate encoding and packing technique, cryptographic solutions based on fully homomorphic encryption for secure computations of GWAS can be practical.
Collapse
|
43
|
Zhang Y, Dai W, Jiang X, Xiong H, Wang S. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption. BMC Med Inform Decis Mak 2015; 15 Suppl 5:S5. [PMID: 26733391 PMCID: PMC4698942 DOI: 10.1186/1472-6947-15-s5-s5] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The increasing availability of genome data motivates massive research studies in personalized treatment and precision medicine. Public cloud services provide a flexible way to mitigate the storage and computation burden in conducting genome-wide association studies (GWAS). However, data privacy has been widely concerned when sharing the sensitive information in a cloud environment. METHODS We presented a novel framework (FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption) to fully outsource GWAS (i.e., chi-square statistic computation) using homomorphic encryption. The proposed framework enables secure divisions over encrypted data. We introduced two division protocols (i.e., secure errorless division and secure approximation division) with a trade-off between complexity and accuracy in computing chi-square statistics. RESULTS The proposed framework was evaluated for the task of chi-square statistic computation with two case-control datasets from the 2015 iDASH genome privacy protection challenge. Experimental results show that the performance of FORESEE can be significantly improved through algorithmic optimization and parallel computation. Remarkably, the secure approximation division provides significant performance gain, but without missing any significance SNPs in the chi-square association test using the aforementioned datasets. CONCLUSIONS Unlike many existing HME based studies, in which final results need to be computed by the data owner due to the lack of the secure division operation, the proposed FORESEE framework support complete outsourcing to the cloud and output the final encrypted chi-square statistics.
Collapse
Affiliation(s)
- Yuchen Zhang
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Wenrui Dai
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Hongkai Xiong
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shuang Wang
- Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
44
|
Khazaei H, McGregor C, Eklund JM, El-Khatib K. Real-Time and Retrospective Health-Analytics-as-a-Service: A Novel Framework. JMIR Med Inform 2015; 3:e36. [PMID: 26582268 PMCID: PMC4704962 DOI: 10.2196/medinform.4640] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2015] [Revised: 09/04/2015] [Accepted: 09/30/2015] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Analytics-as-a-service (AaaS) is one of the latest provisions emerging from the cloud services family. Utilizing this paradigm of computing in health informatics will benefit patients, care providers, and governments significantly. This work is a novel approach to realize health analytics as services in critical care units in particular. OBJECTIVE To design, implement, evaluate, and deploy an extendable big-data compatible framework for health-analytics-as-a-service that offers both real-time and retrospective analysis. METHODS We present a novel framework that can realize health data analytics-as-a-service. The framework is flexible and configurable for different scenarios by utilizing the latest technologies and best practices for data acquisition, transformation, storage, analytics, knowledge extraction, and visualization. We have instantiated the proposed method, through the Artemis project, that is, a customization of the framework for live monitoring and retrospective research on premature babies and ill term infants in neonatal intensive care units (NICUs). RESULTS We demonstrated the proposed framework in this paper for monitoring NICUs and refer to it as the Artemis-In-Cloud (Artemis-IC) project. A pilot of Artemis has been deployed in the SickKids hospital NICU. By infusing the output of this pilot set up to an analytical model, we predict important performance measures for the final deployment of Artemis-IC. This process can be carried out for other hospitals following the same steps with minimal effort. SickKids' NICU has 36 beds and can classify the patients generally into 5 different types including surgical and premature babies. The arrival rate is estimated as 4.5 patients per day, and the average length of stay was calculated as 16 days. Mean number of medical monitoring algorithms per patient is 9, which renders 311 live algorithms for the whole NICU running on the framework. The memory and computation power required for Artemis-IC to handle the SickKids NICU will be 32 GB and 16 CPU cores, respectively. The required amount of storage was estimated as 8.6 TB per year. There will always be 34.9 patients in SickKids NICU on average. Currently, 46% of patients cannot get admitted to SickKids NICU due to lack of resources. By increasing the capacity to 90 beds, all patients can be accommodated. For such a provisioning, Artemis-IC will need 16 TB of storage per year, 55 GB of memory, and 28 CPU cores. CONCLUSIONS Our contributions in this work relate to a cloud architecture for the analysis of physiological data for clinical decisions support for tertiary care use. We demonstrate how to size the equipment needed in the cloud for that architecture based on a very realistic assessment of the patient characteristics and the associated clinical decision support algorithms that would be required to run for those patients. We show the principle of how this could be performed and furthermore that it can be replicated for any critical care setting within a tertiary institution.
Collapse
Affiliation(s)
- Hamzeh Khazaei
- IBM, Canada Research and Development Center, Markham, Toronto, ON, Canada.
| | | | | | | |
Collapse
|
45
|
|
46
|
Wang S, Zhang Y, Dai W, Lauter K, Kim M, Tang Y, Xiong H, Jiang X. HEALER: homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics 2015; 32:211-8. [PMID: 26446135 DOI: 10.1093/bioinformatics/btv563] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Accepted: 09/22/2015] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual's privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. RESULTS We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. AVAILABILITY AND IMPLEMENTATION Download HEALER at http://research.ucsd-dbmi.org/HEALER/ CONTACT: shw070@ucsd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuang Wang
- Department of Biomedical Informatics, University of California, San Diego, CA 92093
| | - Yuchen Zhang
- Department of Biomedical Informatics, University of California, San Diego, CA 92093, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Wenrui Dai
- Department of Biomedical Informatics, University of California, San Diego, CA 92093, Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | | | - Miran Kim
- Seoul National University, Seoul, 151-742, Republic of Korea and
| | - Yuzhe Tang
- Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, NY 13244, USA
| | - Hongkai Xiong
- Department of Electronic Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California, San Diego, CA 92093
| |
Collapse
|
47
|
|
48
|
Lauter K, López-Alt A, Naehrig M. Private Computation on Encrypted Genomic Data. PROGRESS IN CRYPTOLOGY - LATINCRYPT 2014 2015. [DOI: 10.1007/978-3-319-16295-9_1] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
49
|
Informatics methods in medical privacy. J Biomed Inform 2014; 50:1-3. [DOI: 10.1016/j.jbi.2014.07.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2014] [Revised: 07/07/2014] [Accepted: 07/09/2014] [Indexed: 11/22/2022]
|