1
|
Casaletto J, Bernier A, McDougall R, Cline MS. Federated Analysis for Privacy-Preserving Data Sharing: A Technical and Legal Primer. Annu Rev Genomics Hum Genet 2023; 24:347-368. [PMID: 37253596 PMCID: PMC10846631 DOI: 10.1146/annurev-genom-110122-084756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Continued advances in precision medicine rely on the widespread sharing of data that relate human genetic variation to disease. However, data sharing is severely limited by legal, regulatory, and ethical restrictions that safeguard patient privacy. Federated analysis addresses this problem by transferring the code to the data-providing the technical and legal capability to analyze the data within their secure home environment rather than transferring the data to another institution for analysis. This allows researchers to gain new insights from data that cannot be moved, while respecting patient privacy and the data stewards' legal obligations. Because federated analysis is a technical solution to the legal challenges inherent in data sharing, the technology and policy implications must be evaluated together. Here, we summarize the technical approaches to federated analysis and provide a legal analysis of their policy implications.
Collapse
Affiliation(s)
- James Casaletto
- Genomics Institute, University of California, Santa Cruz, California, USA; ,
| | - Alexander Bernier
- Centre of Genomics and Policy, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada; ,
| | - Robyn McDougall
- Centre of Genomics and Policy, Faculty of Medicine and Health Sciences, McGill University, Montreal, Quebec, Canada; ,
| | - Melissa S Cline
- Genomics Institute, University of California, Santa Cruz, California, USA; ,
| |
Collapse
|
2
|
Kuo TT, Jiang X, Tang H, Wang X, Harmanci A, Kim M, Post K, Bu D, Bath T, Kim J, Liu W, Chen H, Ohno-Machado L. The evolving privacy and security concerns for genomic data analysis and sharing as observed from the iDASH competition. J Am Med Inform Assoc 2022; 29:2182-2190. [PMID: 36164820 PMCID: PMC9667175 DOI: 10.1093/jamia/ocac165] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Revised: 08/25/2022] [Accepted: 09/13/2022] [Indexed: 01/11/2023] Open
Abstract
Concerns regarding inappropriate leakage of sensitive personal information as well as unauthorized data use are increasing with the growth of genomic data repositories. Therefore, privacy and security of genomic data have become increasingly important and need to be studied. With many proposed protection techniques, their applicability in support of biomedical research should be well understood. For this purpose, we have organized a community effort in the past 8 years through the integrating data for analysis, anonymization and sharing consortium to address this practical challenge. In this article, we summarize our experience from these competitions, report lessons learned from the events in 2020/2021 as examples, and discuss potential future research directions in this emerging field.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- Corresponding Author: Tsung-Ting Kuo, PhD, UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093, USA;
| | | | | | | | - Arif Harmanci
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Miran Kim
- Department of Mathematics, Hanyang University, Seoul, Republic of Korea,Department of Computer Science, Hanyang University, Seoul, Republic of Korea
| | - Kai Post
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Diyue Bu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Tyler Bath
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Jihoon Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Weijie Liu
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Hongbo Chen
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA,Division of Health Services Research & Development, Veteran Affairs San Diego Healthcare System, San Diego, California, USA
| |
Collapse
|
3
|
Functional genomics data: privacy risk assessment and technological mitigation. Nat Rev Genet 2022; 23:245-258. [PMID: 34759381 DOI: 10.1038/s41576-021-00428-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/18/2021] [Indexed: 12/15/2022]
Abstract
The generation of functional genomics data by next-generation sequencing has increased greatly in the past decade. Broad sharing of these data is essential for research advancement but poses notable privacy challenges, some of which are analogous to those that occur when sharing genetic variant data. However, there are also unique privacy challenges that arise from cryptic information leakage during the processing and summarization of functional genomics data from raw reads to derived quantities, such as gene expression values. Here, we review these challenges and present potential solutions for mitigating privacy risks while allowing broad data dissemination and analysis.
Collapse
|
4
|
Widanage C, Liu W, Li J, Chen H, Wang X, Tang H, Fox J. HySec-Flow: Privacy-Preserving Genomic Computing with SGX-based Big-Data Analytics Framework. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING 2021; 2021:733-743. [PMID: 35662807 PMCID: PMC9165173 DOI: 10.1109/cloud53861.2021.00098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Trusted execution environments (TEE) such as Intel's Software Guard Extension (SGX) have been widely studied to boost security and privacy protection for the computation of sensitive data such as human genomics. However, a performance hurdle is often generated by SGX, especially from the small enclave memory. In this paper, we propose a new Hybrid Secured Flow framework (called "HySec-Flow") for large-scale genomic data analysis using SGX platforms. Here, the data-intensive computing tasks can be partitioned into independent subtasks to be deployed into distinct secured and non-secured containers, therefore allowing for parallel execution while alleviating the limited size of Page Cache (EPC) memory in each enclave. We illustrate our contributions using a workflow supporting indexing, alignment, dispatching, and merging the execution of SGX- enabled containers. We provide details regarding the architecture of the trusted and untrusted components and the underlying Scorn and Graphene support as generic shielding execution frameworks to port legacy code. We thoroughly evaluate the performance of our privacy-preserving reads mapping algorithm using real human genome sequencing data. The results demonstrate that the performance is enhanced by partitioning the time-consuming genomic computation into subtasks compared to the conventional execution of the data-intensive reads mapping algorithm in an enclave. The proposed HySec-Flow framework is made available as an open-source and adapted to the data-parallel computation of other large-scale genomic tasks requiring security and scalable computational resources.
Collapse
|
5
|
Carpov S, Gama N, Georgieva M, Troncoso-Pastoriza JR. Privacy-preserving semi-parallel logistic regression training with fully homomorphic encryption. BMC Med Genomics 2020; 13:88. [PMID: 32693814 PMCID: PMC7372765 DOI: 10.1186/s12920-020-0723-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Privacy-preserving computations on genomic data, and more generally on medical data, is a critical path technology for innovative, life-saving research to positively and equally impact the global population. It enables medical research algorithms to be securely deployed in the cloud because operations on encrypted genomic databases are conducted without revealing any individual genomes. Methods for secure computation have shown significant performance improvements over the last several years. However, it is still challenging to apply them on large biomedical datasets. Methods The HE Track of iDash 2018 competition focused on solving an important problem in practical machine learning scenarios, where a data analyst that has trained a regression model (both linear and logistic) with a certain set of features, attempts to find all features in an encrypted database that will improve the quality of the model. Our solution is based on the hybrid framework Chimera that allows for switching between different families of fully homomorphic schemes, namely TFHE and HEAAN. Results Our solution is one of the finalist of Track 2 of iDash 2018 competition. Among the submitted solutions, ours is the only bootstrapped approach that can be applied for different sets of parameters without re-encrypting the genomic database, making it practical for real-world applications. Conclusions This is the first step towards the more general feature selection problem across large encrypted databases.
Collapse
Affiliation(s)
- Sergiu Carpov
- CEA, LIST, Point Courier 172, Gif-sur-Yvette cedex, 91191, France.,Inpher, Innovation Park A, Lausanne, CH-1015, Switzerland
| | - Nicolas Gama
- Inpher, Innovation Park A, Lausanne, CH-1015, Switzerland
| | - Mariya Georgieva
- Inpher, Innovation Park A, Lausanne, CH-1015, Switzerland. .,EPFL, Route Cantonal, Lausanne, CH-1015, Switzerland.
| | | |
Collapse
|
6
|
Wang X, Tang H, Wang S, Jiang X, Wang W, Bu D, Wang L, Jiang Y, Wang C. iDASH secure genome analysis competition 2017. BMC Med Genomics 2018; 11:85. [PMID: 30309344 PMCID: PMC6180354 DOI: 10.1186/s12920-018-0396-0] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Affiliation(s)
- XiaoFeng Wang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408, USA.
| | - Haixu Tang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408, USA
| | - Shuang Wang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Xiaoqian Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX, 77030, USA
| | - Wenhao Wang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408, USA
| | - Diyue Bu
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408, USA
| | - Lei Wang
- School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, 47408, USA
| | - Yicheng Jiang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Chenghong Wang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| |
Collapse
|