1
|
Chapman CR. Ethical, legal, and social implications of genetic risk prediction for multifactorial disease: a narrative review identifying concerns about interpretation and use of polygenic scores. J Community Genet 2023; 14:441-452. [PMID: 36529843 PMCID: PMC10576696 DOI: 10.1007/s12687-022-00625-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 12/04/2022] [Indexed: 12/23/2022] Open
Abstract
Advances in genomics have enabled the development of polygenic scores (PGS), sometimes called polygenic risk scores, in the context of multifactorial diseases and disorders such as cancer, cardiovascular disease, and schizophrenia. PGS estimate an individual's genetic predisposition, as compared to other members of a population, for conditions which are influenced by both genetic and environmental factors. There is significant interest in using genetic risk prediction afforded through PGS in public health, clinical care, and research settings, yet many acknowledge the need to thoughtfully consider and address ethical, legal, and social implications (ELSI). To contribute to this effort, this paper reports on a narrative review of the literature, with the aim of identifying and categorizing ELSI relating to genetic risk prediction in the context of multifactorial disease, which have been raised by scholars in the field. Ninety-two articles, spanning from 1977 to 2021, met the inclusion criteria for this study. Identified ELSI included potential benefits, challenges and risks that focused on concerns about interpretation and use, and ethical obligations to maximize benefits, minimize risks, promote justice, and support autonomy. This research will support geneticists, clinicians, genetic counselors, patients, patient advocates, and policymakers in recognizing and addressing ethical concerns associated with PGS; it will also guide future empirical and normative research.
Collapse
Affiliation(s)
- Carolyn Riley Chapman
- Department of Population Health (Division of Medical Ethics), NYU Grossman School of Medicine, New York, NY, USA.
- Center for Human Genetics and Genomics, NYU Grossman School of Medicine, Science Building, 435 E. 30th St, 8th Floor, New York, NY, 10016, USA.
| |
Collapse
|
2
|
Privacy-preserving genotype imputation with fully homomorphic encryption. Cell Syst 2022; 13:173-182.e3. [PMID: 34758288 PMCID: PMC8857019 DOI: 10.1016/j.cels.2021.10.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Revised: 06/28/2021] [Accepted: 10/15/2021] [Indexed: 12/17/2022]
Abstract
Genotype imputation is the inference of unknown genotypes using known population structure observed in large genomic datasets; it can further our understanding of phenotype-genotype relationships and is useful for QTL mapping and GWASs. However, the compute-intensive nature of genotype imputation can overwhelm local servers for computation and storage. Hence, many researchers are moving toward using cloud services, raising privacy concerns. We address these concerns by developing an efficient, privacy-preserving algorithm called p-Impute. Our method uses homomorphic encryption, allowing calculations on ciphertext, thereby avoiding the decryption of private genotypes in the cloud. It is similar to k-nearest neighbor approaches, inferring missing genotypes in a genomic block based on the SNP genotypes of genetically related individuals in the same block. Our results demonstrate accuracy in agreement with the state-of-the-art plaintext solutions. Moreover, p-Impute is scalable to real-world applications as its memory and time requirements increase linearly with the increasing number of samples. p-Impute is freely available for download here: https://doi.org/10.5281/zenodo.5542001.
Collapse
|
3
|
Lu D, Zhang Y, Zhang L, Wang H, Weng W, Li L, Cai H. Methods of privacy-preserving genomic sequencing data alignments. Brief Bioinform 2021; 22:6279828. [PMID: 34021302 DOI: 10.1093/bib/bbab151] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 03/10/2021] [Accepted: 03/30/2021] [Indexed: 11/14/2022] Open
Abstract
Genomic data alignment, a fundamental operation in sequencing, can be utilized to map reads into a reference sequence, query on a genomic database and perform genetic tests. However, with the reduction of sequencing cost and the accumulation of genome data, privacy-preserving genomic sequencing data alignment is becoming unprecedentedly important. In this paper, we present a comprehensive review of secure genomic data comparison schemes. We discuss the privacy threats, including adversaries and privacy attacks. The attacks can be categorized into inference, membership, identity tracing and completion attacks and have been applied to obtaining the genomic privacy information. We classify the state-of-the-art genomic privacy-preserving alignment methods into three different scenarios: large-scale reads mapping, encrypted genomic datasets querying and genetic testing to ease privacy threats. A comprehensive analysis of these approaches has been carried out to evaluate the computation and communication complexity as well as the privacy requirements. The survey provides the researchers with the current trends and the insights on the significance and challenges of privacy issues in genomic data alignment.
Collapse
Affiliation(s)
- Dandan Lu
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yue Zhang
- School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510006, China
| | - Ling Zhang
- Department of Radiology, Sun Yat-sen University Cancer Center; State Key Laboratory of Oncology in South China; Collaborative Innovation Center for Cancer Medicine, 651 Dongfeng East Road, Guangzhou, P. R. China,510060
| | - Haiyan Wang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Wanlin Weng
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Li Li
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hongmin Cai
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510006, China
| |
Collapse
|
4
|
Kuo TT, Kim J, Gabriel RA. Privacy-preserving model learning on a blockchain network-of-networks. J Am Med Inform Assoc 2021; 27:343-354. [PMID: 31943009 PMCID: PMC7025358 DOI: 10.1093/jamia/ocz214] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/04/2019] [Accepted: 12/02/2019] [Indexed: 01/07/2023] Open
Abstract
Objective To facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a “flattened” topology, while real-world research networks may consist of “network-of-networks” which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks. Materials and Methods We propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology. Results HierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level. Discussion HierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns. Conclusion We demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction.
Collapse
Affiliation(s)
- Tsung-Ting Kuo
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Jihoon Kim
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA
| | - Rodney A Gabriel
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, California, USA.,Department of Anesthesiology, University of California San Diego, San Diego, California, USA
| |
Collapse
|
5
|
Almeida JR, Pratas D, Oliveira JL. A semi-automatic methodology for analysing distributed and private biobanks. Comput Biol Med 2020; 130:104180. [PMID: 33360272 DOI: 10.1016/j.compbiomed.2020.104180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2020] [Revised: 12/14/2020] [Accepted: 12/14/2020] [Indexed: 10/22/2022]
Abstract
Privacy issues limit the analysis and cross-exploration of most distributed and private biobanks, often raised by the multiple dimensionality and sensitivity of the data associated with access restrictions and policies. These characteristics prevent collaboration between entities, constituting a barrier to emergent personalized and public health challenges, namely the discovery of new druggable targets, identification of disease-causing genetic variants, or the study of rare diseases. In this paper, we propose a semi-automatic methodology for the analysis of distributed and private biobanks. The strategies involved in the proposed methodology efficiently enable the creation and execution of unified genomic studies using distributed repositories, without compromising the information present in the datasets. We apply the methodology to a case study in the current Covid-19, ensuring the combination of the diagnostics from multiple entities while maintaining privacy through a completely identical procedure. Moreover, we show that the methodology follows a simple, intuitive, and practical scheme.
Collapse
Affiliation(s)
- João Rafael Almeida
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Diogo Pratas
- DETI/IEETA, University of Aveiro, Aveiro, Portugal; Department of Virology, University of Helsinki, Helsinki, Finland.
| | | |
Collapse
|
6
|
Telenti A, Jiang X. Treating medical data as a durable asset. Nat Genet 2020; 52:1005-1010. [PMID: 32929286 DOI: 10.1038/s41588-020-0698-y] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Accepted: 08/21/2020] [Indexed: 11/09/2022]
Abstract
Access to medical data is central for conducting research on genomics. However, to tap these metadata (observable traits and phenotypes, diagnoses and medication, and labels), researchers must grapple with the complex and sensitive nature of the information. In this Perspective, we argue that, at this exciting time for genomics and artificial intelligence, several critical aspects of data generation, infrastructure and management are pillars of a modern data ecosystem. Many risks to privacy and many obstacles to medical research can be eliminated or mitigated by new secure data analytics. Finally, we discuss the potential consequences of medical data exiting the institutions and being managed by individuals. These shifts in data ownership have the potential for profound disruption and opportunity across many fields.
Collapse
Affiliation(s)
- Amalio Telenti
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA, USA.
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| |
Collapse
|
7
|
Gingras SN, Tang D, Tuff J, McLaren PJ. Minding the gap in HIV host genetics: opportunities and challenges. Hum Genet 2020; 139:865-875. [PMID: 32409920 PMCID: PMC7272494 DOI: 10.1007/s00439-020-02177-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 03/12/2020] [Indexed: 12/15/2022]
Abstract
Genome-wide association studies (GWAS) have been successful in identifying and confirming novel genetic variants that are associated with diverse HIV phenotypes. However, these studies have predominantly focused on European cohorts. HLA molecules have been consistently associated with HIV outcomes, some of which have been found to be population specific, underscoring the need for diversity in GWAS. Recently, there has been a concerted effort to address this gap that leads to health care (disease prevention, diagnosis, treatment) disparities with marginal improvement. As precision medicine becomes more utilized, non-European individuals will be more and more disadvantaged, as the genetic variants identified in genomic research based on European populations may not accurately reflect that of non-European individuals. Leveraging pre-existing, large, multiethnic cohorts, such as the UK Biobank, 23andMe, and the National Institute of Health's All of Us Research Program, can contribute in raising genomic research in non-European populations and ultimately lead to better health outcomes.
Collapse
Affiliation(s)
- Shanelle N. Gingras
- JC Wilt Infectious Diseases Research Centre, National HIV and Retrovirology Lab, National Microbiology Laboratories, Public Health Agency of Canada, Winnipeg, Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| | - David Tang
- JC Wilt Infectious Diseases Research Centre, National HIV and Retrovirology Lab, National Microbiology Laboratories, Public Health Agency of Canada, Winnipeg, Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| | - Jeffrey Tuff
- JC Wilt Infectious Diseases Research Centre, National HIV and Retrovirology Lab, National Microbiology Laboratories, Public Health Agency of Canada, Winnipeg, Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| | - Paul J. McLaren
- JC Wilt Infectious Diseases Research Centre, National HIV and Retrovirology Lab, National Microbiology Laboratories, Public Health Agency of Canada, Winnipeg, Canada
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| |
Collapse
|
8
|
Kockan C, Zhu K, Dokmai N, Karpov N, Kulekci MO, Woodruff DP, Sahinalp SC. Sketching algorithms for genomic data analysis and querying in a secure enclave. Nat Methods 2020; 17:295-301. [PMID: 32132732 DOI: 10.1038/s41592-020-0761-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 01/22/2020] [Indexed: 11/09/2022]
Abstract
Genome-wide association studies (GWAS), especially on rare diseases, may necessitate exchange of sensitive genomic data between multiple institutions. Since genomic data sharing is often infeasible due to privacy concerns, cryptographic methods, such as secure multiparty computation (SMC) protocols, have been developed with the aim of offering privacy-preserving collaborative GWAS. Unfortunately, the computational overhead of these methods remain prohibitive for human-genome-scale data. Here we introduce SkSES (https://github.com/ndokmai/sgx-genome-variants-search), a hardware-software hybrid approach for privacy-preserving collaborative GWAS, which improves the running time of the most advanced cryptographic protocols by two orders of magnitude. The SkSES approach is based on trusted execution environments (TEEs) offered by current-generation microprocessors-in particular, Intel's SGX. To overcome the severe memory limitation of the TEEs, SkSES employs novel 'sketching' algorithms that maintain essential statistical information on genomic variants in input VCF files. By additionally incorporating efficient data compression and population stratification reduction methods, SkSES identifies the top k genomic variants in a cohort quickly, accurately and in a privacy-preserving manner.
Collapse
Affiliation(s)
- Can Kockan
- Department of Computer Science, Indiana University, Bloomington, IN, USA.,Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Kaiyuan Zhu
- Department of Computer Science, Indiana University, Bloomington, IN, USA.,Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Natnatee Dokmai
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Nikolai Karpov
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - M Oguzhan Kulekci
- Informatics Institute, Istanbul Technical University, Istanbul, Turkey
| | - David P Woodruff
- Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
9
|
Toh S. Analytic and Data Sharing Options in Real-World Multidatabase Studies of Comparative Effectiveness and Safety of Medical Products. Clin Pharmacol Ther 2020; 107:834-842. [PMID: 31869442 DOI: 10.1002/cpt.1754] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/21/2019] [Indexed: 12/20/2022]
Abstract
A wide range of analytic and data sharing options are available in nonexperimental multidatabase studies designed to assess the real-world benefits and risks of medical products. Researchers often consider six scientific domains when choosing among these options-study design, exposure type, outcome type, covariate summarization technique, covariate adjustment method, and data sharing approach. This article reviews available analytic and data sharing options and discusses key scientific and practical considerations when choosing among these options in multidatabase studies of comparative effectiveness and safety of medical products. The scientific considerations must be balanced against what the data-contributing sites are able or willing to share. While pooling of person-level data sets remains the most familiar and analytically flexible approach, newer analytic and data sharing approaches that share less granular summary-level information may be equally valid and preferred in some multidatabase studies, especially when sharing of person-level data is challenging or infeasible.
Collapse
Affiliation(s)
- Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA
| |
Collapse
|
10
|
Boyce A, Walker A, Duggal P, Thio CL, Geller G. Personal Genetic Information about HIV: Research Participants' Views of Ethical, Social, and Behavioral Implications. Public Health Genomics 2019; 22:36-45. [PMID: 31461719 DOI: 10.1159/000501672] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 06/21/2019] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Personal genetic information (PGI) about HIV is produced in research and entering the clinic and direct-to-consumer market, but little consideration has been given to ethical and social issues, public perspectives, and potential behavioral implications. OBJECTIVES This research queried the views of research participants at risk for or infected with HIV, exploring their perspectives on HIV-related PGI and its ethical, social, and behavioral implications. METHODS We used focus groups to collect rich information about participants' perspectives on the ethical, social, and behavioral implications of PGI about HIV and host genetic research. We evaluated their reactions to three different types of genetic variants: those that made them more susceptible to HIV, more protected from or resistant to HIV, or more likely to transmit HIV to others. RESULTS Overall, participants wanted PGI about HIV. Their reasons included a mix of personal or family health benefit and benefit to others, which varied in emphasis depending on variant type. While susceptibility variant information was seen primarily in terms of personal or family health benefit, for transmissibility and protective variant information, benefit to others emerged as a major reason for wanting PGI about HIV. Participants thought transmissibility variant information would help them prevent others from becoming infected, and protective variant information would allow them to volunteer for targeted research to help treat, cure, or prevent HIV. Possible harms were raised regarding the tendencies among some individuals to increase risky behavior with modulations in perceived risk. Potential behavioral implications were seen as significant, though complex, reflecting multifaceted risk perceptions. CONCLUSIONS Our study adds to the evidence that participants in genetic research, across disease type, have a strong desire for PGI. For participants in research on the genetics of HIV, and potentially other infectious diseases, their desire for PGI is grounded in a perceived duty not to infect others, where they feel a moral responsibility regarding research participation and behavior change. Wider dissemination of HIV-related PGI may well increase research participation, but could have mixed effects on risk behavior. More research is needed on the implications of different variant types of PGI beyond susceptibility factors, especially protective variants or resistance factors.
Collapse
Affiliation(s)
- Angie Boyce
- Berman Institute of Bioethics, Johns Hopkins University, Baltimore, Maryland, USA,
| | - Alexis Walker
- Berman Institute of Bioethics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Chloe L Thio
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| | - Gail Geller
- Berman Institute of Bioethics, Johns Hopkins University, Baltimore, Maryland, USA.,Department of Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA
| |
Collapse
|
11
|
Carter AB. Considerations for Genomic Data Privacy and Security when Working in the Cloud. J Mol Diagn 2019; 21:542-552. [DOI: 10.1016/j.jmoldx.2018.07.009] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2017] [Revised: 05/16/2018] [Accepted: 07/02/2018] [Indexed: 01/21/2023] Open
|
12
|
Systematizing Genome Privacy Research: A Privacy-Enhancing Technologies Perspective. PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES 2018. [DOI: 10.2478/popets-2019-0006] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Abstract
Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and provide a critical analysis of the current knowledge on privacy-enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work published in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, focusing in particular on those that are inherently tied to the nature of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy experts.
Collapse
|
13
|
Privacy-Preserving Similar Patient Queries for Combined Biomedical Data. PROCEEDINGS ON PRIVACY ENHANCING TECHNOLOGIES 2018. [DOI: 10.2478/popets-2019-0004] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Abstract
The decreasing costs of molecular profiling have fueled the biomedical research community with a plethora of new types of biomedical data, enabling a breakthrough towards more precise and personalized medicine. Naturally, the increasing availability of data also enables physicians to compare patients’ data and treatments easily and to find similar patients in order to propose the optimal therapy. Such similar patient queries (SPQs) are of utmost importance to medical practice and will be relied upon in future health information exchange systems. While privacy-preserving solutions have been previously studied, those are limited to genomic data, ignoring the different newly available types of biomedical data.
In this paper, we propose new cryptographic techniques for finding similar patients in a privacy-preserving manner with various types of biomedical data, including genomic, epigenomic and transcriptomic data as well as their combination. We design protocols for two of the most common similarity metrics in biomedicine: the Euclidean distance and Pearson correlation coefficient. Moreover, unlike previous approaches, we account for the fact that certain locations contribute differently to a given disease or phenotype by allowing to limit the query to the relevant locations and to assign them different weights. Our protocols are specifically designed to be highly efficient in terms of communication and bandwidth, requiring only one or two rounds of communication and thus enabling scalable parallel queries. We rigorously prove our protocols to be secure based on cryptographic games and instantiate our technique with three of the most important types of biomedical data – namely DNA, microRNA expression, and DNA methylation. Our experimental results show that our protocols can compute a similarity query over a typical number of positions against a database of 1,000 patients in a few seconds. Finally, we propose and formalize strategies to mitigate the threat of malicious users or hospitals.
Collapse
|
14
|
Raisaro JL, Pradervand S, Colsenet R, Jacquemont N, Rosat N, Mooser V, Hubaux JP. Protecting Privacy and Security of Genomic Data in i2b2 with Homomorphic Encryption and Differential Privacy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1413-1426. [PMID: 30004884 DOI: 10.1109/tcbb.2018.2854782] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Re-use of patients' health records can provide tremendous benefits for clinical research. Yet, when researchers need to access sensitive/identifying data, such as genomic data, in order to compile cohorts of well-characterized patients for specific studies, privacy and security concerns represent major obstacles that make such a procedure extremely difficult if not impossible. In this paper, we address the challenge of designing and deploying in a real operational setting an efficient privacy-preserving explorer for genetic cohorts. Our solution is built on top of the i2b2 (Informatics for Integrating Biology and the Bedside) framework and leverages cutting-edge privacy-enhancing technologies such as homomorphic encryption and differential privacy. Solutions involving homomorphic encryption are often believed to be costly and immature for use in operational environments. Here, we show that, for specific applications, homomorphic encryption is actually a very efficient enabler. Indeed, our solution outperforms prior work by enabling a researcher to securely compute simple statistics on more than 3,000 encrypted genetic variants simultaneously for a cohort of 5,000 individuals in less than 5 seconds with commodity hardware. To the best of our knowledge, our privacy-preserving solution is the first to also be successfully deployed and tested in a operation setting (Lausanne University Hospital).
Collapse
|
15
|
Raisaro JL, Klann JG, Wagholikar KB, Estiri H, Hubaux JP, Murphy SN. Feasibility of Homomorphic Encryption for Sharing I2B2 Aggregate-Level Data in the Cloud. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2018; 2017:176-185. [PMID: 29888067 PMCID: PMC5961814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The biomedical community is lagging in the adoption of cloud computing for the management of medical data. The primary obstacles are concerns about privacy and security. In this paper, we explore the feasibility of using advanced privacy-enhancing technologies in order to enable the sharing of sensitive clinical data in a public cloud. Our goal is to facilitate sharing of clinical data in the cloud by minimizing the risk of unintended leakage of sensitive clinical information. In particular, we focus on homomorphic encryption, a specific type of encryption that offers the ability to run computation on the data while the data remains encrypted. This paper demonstrates that homomorphic encryption can be used efficiently to compute aggregating queries on the ciphertexts, along with providing end-to-end confidentiality of aggregate-level data from the i2b2 data model.
Collapse
Affiliation(s)
| | - Jeffrey G Klann
- Partners Healthcare, Boston, MA, USA,Harvard Medical School, Boston, MA, USA,Massachusetts General Hospital, Boston, MA, USA
| | - Kavishwar B Wagholikar
- Partners Healthcare, Boston, MA, USA,Harvard Medical School, Boston, MA, USA,Massachusetts General Hospital, Boston, MA, USA
| | - Hossein Estiri
- Partners Healthcare, Boston, MA, USA,Harvard Medical School, Boston, MA, USA,Massachusetts General Hospital, Boston, MA, USA
| | | | - Shawn N Murphy
- Partners Healthcare, Boston, MA, USA,Harvard Medical School, Boston, MA, USA,Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
16
|
Raisaro JL, McLaren PJ, Fellay J, Cavassini M, Klersy C, Hubaux JP. Are privacy-enhancing technologies for genomic data ready for the clinic? A survey of medical experts of the Swiss HIV Cohort Study. J Biomed Inform 2018; 79:1-6. [PMID: 29331453 DOI: 10.1016/j.jbi.2017.12.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Revised: 12/21/2017] [Accepted: 12/23/2017] [Indexed: 12/23/2022]
Abstract
PURPOSE Protecting patient privacy is a major obstacle for the implementation of genomic-based medicine. Emerging privacy-enhancing technologies can become key enablers for managing sensitive genetic data. We studied physicians' attitude toward this kind of technology in order to derive insights that might foster their future adoption for clinical care. METHODS We conducted a questionnaire-based survey among 55 physicians of the Swiss HIV Cohort Study who tested the first implementation of a privacy-preserving model for delivering genomic test results. We evaluated their feedback on three different aspects of our model: clinical utility, ability to address privacy concerns and system usability. RESULTS 38/55 (69%) physicians participated in the study. Two thirds of them acknowledged genetic privacy as a key aspect that needs to be protected to help building patient trust and deploy new-generation medical information systems. All of them successfully used the tool for evaluating their patients' pharmacogenomics risk and 90% were happy with the user experience and the efficiency of the tool. Only 8% of physicians were unsatisfied with the level of information and wanted to have access to the patient's actual DNA sequence. CONCLUSION This survey, although limited in size, represents the first evaluation of privacy-preserving models for genomic-based medicine. It has allowed us to derive unique insights that will improve the design of these new systems in the future. In particular, we have observed that a clinical information system that uses homomorphic encryption to provide clinicians with risk information based on sensitive genetic test results can offer information that clinicians feel sufficient for their needs and appropriately respectful of patients' privacy. The ability of this kind of systems to ensure strong security and privacy guarantees and to provide some analytics on encrypted data has been assessed as a key enabler for the management of sensitive medical information in the near future. Providing clinically relevant information to physicians while protecting patients' privacy in order to comply with regulations is crucial for the widespread use of these new technologies.
Collapse
Affiliation(s)
- Jean-Louis Raisaro
- School of Computer Communications Sciences, École Polytechnique Fédérale de Lausanne, Switzerland
| | - Paul J McLaren
- J.C. Wilt Infectious Diseases Research Centre, National Microbiology Laboratories, Public Health Agency of Canada, Winnipeg, Canada; Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Canada
| | - Jacques Fellay
- School of Life Sciences, École Polytechnique Fédérale de Lausanne, Switzerland; Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Matthias Cavassini
- Division of Infectious Diseases, Lausanne University Hospital, Switzerland
| | - Catherine Klersy
- Service of Biometry and Clinical Epidemiology, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Jean-Pierre Hubaux
- School of Computer Communications Sciences, École Polytechnique Fédérale de Lausanne, Switzerland.
| | | |
Collapse
|
17
|
Chen F, Wang S, Jiang X, Ding S, Lu Y, Kim J, Sahinalp SC, Shimizu C, Burns JC, Wright VJ, Png E, Hibberd ML, Lloyd DD, Yang H, Telenti A, Bloss CS, Fox D, Lauter K, Ohno-Machado L. PRINCESS: Privacy-protecting Rare disease International Network Collaboration via Encryption through Software guard extensionS. Bioinformatics 2017; 33:871-878. [PMID: 28065902 DOI: 10.1093/bioinformatics/btw758] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 11/23/2016] [Indexed: 12/19/2022] Open
Abstract
Motivation We introduce PRINCESS, a privacy-preserving international collaboration framework for analyzing rare disease genetic data that are distributed across different continents. PRINCESS leverages Software Guard Extensions (SGX) and hardware for trustworthy computation. Unlike a traditional international collaboration model, where individual-level patient DNA are physically centralized at a single site, PRINCESS performs a secure and distributed computation over encrypted data, fulfilling institutional policies and regulations for protected health information. Results To demonstrate PRINCESS' performance and feasibility, we conducted a family-based allelic association study for Kawasaki Disease, with data hosted in three different continents. The experimental results show that PRINCESS provides secure and accurate analyses much faster than alternative solutions, such as homomorphic encryption and garbled circuits (over 40 000× faster). Availability and Implementation https://github.com/achenfengb/PRINCESS_opensource. Contact shw070@ucsd.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Feng Chen
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Shuang Wang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Xiaoqian Jiang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Sijie Ding
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Yao Lu
- Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
| | - Jihoon Kim
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - S Cenk Sahinalp
- Department of Computer Science and Informatics, Indiana University, Bloomington, IN, USA
| | - Chisato Shimizu
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Jane C Burns
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | | | - Eileen Png
- Genome Institute of Singapore, ASTAR, Singapore, Singapore
| | | | - David D Lloyd
- Deparment of Pediatrics, School of Medicine, Emory University, Atlanta, GA, USA
| | - Hai Yang
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | | | - Cinnamon S Bloss
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Dov Fox
- School of Law, University of San Diego, San Diego, CA, USA
| | - Kristin Lauter
- Cryptography Group, Microsoft Research, San Diego, CA, USA
| | - Lucila Ohno-Machado
- Health System Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
18
|
Wang S, Jiang X, Tang H, Wang X, Bu D, Carey K, Dyke SO, Fox D, Jiang C, Lauter K, Malin B, Sofia H, Telenti A, Wang L, Wang W, Ohno-Machado L. A community effort to protect genomic data sharing, collaboration and outsourcing. NPJ Genom Med 2017; 2:33. [PMID: 29263842 PMCID: PMC5677972 DOI: 10.1038/s41525-017-0036-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Revised: 07/10/2017] [Accepted: 10/10/2017] [Indexed: 12/13/2022] Open
Abstract
The human genome can reveal sensitive information and is potentially re-identifiable, which raises privacy and security concerns about sharing such data on wide scales. In 2016, we organized the third Critical Assessment of Data Privacy and Protection competition as a community effort to bring together biomedical informaticists, computer privacy and security researchers, and scholars in ethical, legal, and social implications (ELSI) to assess the latest advances on privacy-preserving techniques for protecting human genomic data. Teams were asked to develop novel protection methods for emerging genome privacy challenges in three scenarios: Track (1) data sharing through the Beacon service of the Global Alliance for Genomics and Health. Track (2) collaborative discovery of similar genomes between two institutions; and Track (3) data outsourcing to public cloud services. The latter two tracks represent continuing themes from our 2015 competition, while the former was new and a response to a recently established vulnerability. The winning strategy for Track 1 mitigated the privacy risk by hiding approximately 11% of the variation in the database while permitting around 160,000 queries, a significant improvement over the baseline. The winning strategies in Tracks 2 and 3 showed significant progress over the previous competition by achieving multiple orders of magnitude performance improvement in terms of computational runtime and memory requirements. The outcomes suggest that applying highly optimized privacy-preserving and secure computation techniques to safeguard genomic data sharing and analysis is useful. However, the results also indicate that further efforts are needed to refine these techniques into practical solutions.
Collapse
Affiliation(s)
- Shuang Wang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| | - Xiaoqian Jiang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| | - Haixu Tang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Xiaofeng Wang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Diyue Bu
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Knox Carey
- GeneCloud, Intertrust, CA, Sunnyvale, CA 94085 USA
| | - Stephanie Om Dyke
- Centre of Genomics and Policy, Department of Human Genetics, McGill University, Montreal, QC H3A 0G4 Canada
| | - Dov Fox
- School of Law, University of San Diego, San Diego, CA 92110 USA
| | - Chao Jiang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| | - Kristin Lauter
- Cryptography Group, Microsoft Research, San Diego, CA 92122 USA
| | - Bradley Malin
- Department of Biomedical Informatics, School of Medicine, Vanderbilt University, Nashville, TN 37203 USA
| | - Heidi Sofia
- National Human Genome Research Institute, Rockville, MD 20894 USA
| | | | - Lei Wang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Wenhao Wang
- Computer Science and Informatics, Indiana University, Bloomington, IN 47408 USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA 92093 USA
| |
Collapse
|
19
|
Lippert C, Sabatini R, Maher MC, Kang EY, Lee S, Arikan O, Harley A, Bernal A, Garst P, Lavrenko V, Yocum K, Wong T, Zhu M, Yang WY, Chang C, Lu T, Lee CWH, Hicks B, Ramakrishnan S, Tang H, Xie C, Piper J, Brewerton S, Turpaz Y, Telenti A, Roby RK, Och FJ, Venter JC. Identification of individuals by trait prediction using whole-genome sequencing data. Proc Natl Acad Sci U S A 2017; 114:10166-10171. [PMID: 28874526 PMCID: PMC5617305 DOI: 10.1073/pnas.1711125114] [Citation(s) in RCA: 96] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Prediction of human physical traits and demographic information from genomic data challenges privacy and data deidentification in personalized medicine. To explore the current capabilities of phenotype-based genomic identification, we applied whole-genome sequencing, detailed phenotyping, and statistical modeling to predict biometric traits in a cohort of 1,061 participants of diverse ancestry. Individually, for a large fraction of the traits, their predictive accuracy beyond ancestry and demographic information is limited. However, we have developed a maximum entropy algorithm that integrates multiple predictions to determine which genomic samples and phenotype measurements originate from the same person. Using this algorithm, we have reidentified an average of >8 of 10 held-out individuals in an ethnically mixed cohort and an average of 5 of either 10 African Americans or 10 Europeans. This work challenges current conceptions of personal privacy and may have far-reaching ethical and legal implications.
Collapse
Affiliation(s)
| | | | | | | | | | - Okan Arikan
- Human Longevity, Inc., Mountain View, CA 94303
| | | | - Axel Bernal
- Human Longevity, Inc., Mountain View, CA 94303
| | - Peter Garst
- Human Longevity, Inc., Mountain View, CA 94303
| | | | - Ken Yocum
- Human Longevity, Inc., Mountain View, CA 94303
| | | | - Mingfu Zhu
- Human Longevity, Inc., Mountain View, CA 94303
| | | | - Chris Chang
- Human Longevity, Inc., Mountain View, CA 94303
| | - Tim Lu
- Human Longevity, Inc., San Diego, CA 92121
| | | | - Barry Hicks
- Human Longevity, Inc., Mountain View, CA 94303
| | | | - Haibao Tang
- Human Longevity, Inc., Mountain View, CA 94303
| | - Chao Xie
- Human Longevity Singapore, Pte. Ltd., Singapore 138542
| | - Jason Piper
- Human Longevity Singapore, Pte. Ltd., Singapore 138542
| | | | - Yaron Turpaz
- Human Longevity, Inc., San Diego, CA 92121
- Human Longevity Singapore, Pte. Ltd., Singapore 138542
| | | | - Rhonda K Roby
- Human Longevity, Inc., San Diego, CA 92121
- J. Craig Venter Institute, La Jolla, CA 92037
| | - Franz J Och
- Human Longevity, Inc., Mountain View, CA 94303
| | - J Craig Venter
- Human Longevity, Inc., San Diego, CA 92121;
- J. Craig Venter Institute, La Jolla, CA 92037
| |
Collapse
|
20
|
Sousa JS, Lefebvre C, Huang Z, Raisaro JL, Aguilar-Melchor C, Killijian MO, Hubaux JP. Efficient and secure outsourcing of genomic data storage. BMC Med Genomics 2017; 10:46. [PMID: 28786363 PMCID: PMC5547444 DOI: 10.1186/s12920-017-0275-0] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Background Cloud computing is becoming the preferred solution for efficiently dealing with the increasing amount of genomic data. Yet, outsourcing storage and processing sensitive information, such as genomic data, comes with important concerns related to privacy and security. This calls for new sophisticated techniques that ensure data protection from untrusted cloud providers and that still enable researchers to obtain useful information. Methods We present a novel privacy-preserving algorithm for fully outsourcing the storage of large genomic data files to a public cloud and enabling researchers to efficiently search for variants of interest. In order to protect data and query confidentiality from possible leakage, our solution exploits optimal encoding for genomic variants and combines it with homomorphic encryption and private information retrieval. Our proposed algorithm is implemented in C++ and was evaluated on real data as part of the 2016 iDash Genome Privacy-Protection Challenge. Results Results show that our solution outperforms the state-of-the-art solutions and enables researchers to search over millions of encrypted variants in a few seconds. Conclusions As opposed to prior beliefs that sophisticated privacy-enhancing technologies (PETs) are unpractical for real operational settings, our solution demonstrates that, in the case of genomic data, PETs are very efficient enablers.
Collapse
Affiliation(s)
- João Sá Sousa
- Laboratory for Communications and Applications - LCA 1, École Polytechnique Fédérale de Lausanne, Route Cantonale, Lausanne, 1015, Switzerland
| | - Cédric Lefebvre
- Laboratory for Analysis and Architecture of Systems - LAAS-CNRS, Université Toulouse, 7 Avenue du Colonel Roche, Toulouse, 31400, France
| | - Zhicong Huang
- Laboratory for Communications and Applications - LCA 1, École Polytechnique Fédérale de Lausanne, Route Cantonale, Lausanne, 1015, Switzerland
| | - Jean Louis Raisaro
- Laboratory for Communications and Applications - LCA 1, École Polytechnique Fédérale de Lausanne, Route Cantonale, Lausanne, 1015, Switzerland
| | - Carlos Aguilar-Melchor
- Toulouse Institute of Computer Science Research - IRIT, Université Toulouse, 118 Route de Narbonne, Toulouse, F-31062, France
| | - Marc-Olivier Killijian
- Laboratory for Analysis and Architecture of Systems - LAAS-CNRS, Université Toulouse, 7 Avenue du Colonel Roche, Toulouse, 31400, France
| | - Jean-Pierre Hubaux
- Laboratory for Communications and Applications - LCA 1, École Polytechnique Fédérale de Lausanne, Route Cantonale, Lausanne, 1015, Switzerland
| |
Collapse
|
21
|
Chen F, Wang C, Dai W, Jiang X, Mohammed N, Al Aziz MM, Sadat MN, Sahinalp C, Lauter K, Wang S. PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre Guard Extension. BMC Med Genomics 2017; 10:48. [PMID: 28786365 PMCID: PMC5547453 DOI: 10.1186/s12920-017-0281-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Background Advances in DNA sequencing technologies have prompted a wide range of genomic applications to improve healthcare and facilitate biomedical research. However, privacy and security concerns have emerged as a challenge for utilizing cloud computing to handle sensitive genomic data. Methods We present one of the first implementations of Software Guard Extension (SGX) based securely outsourced genetic testing framework, which leverages multiple cryptographic protocols and minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing. Results We compared the performance of the proposed PRESAGE framework with the state-of-the-art homomorphic encryption scheme, as well as the plaintext implementation. The experimental results demonstrated significant performance over the homomorphic encryption methods and a small computational overhead in comparison to plaintext implementation. Conclusions The proposed PRESAGE provides an alternative solution for secure and efficient genomic data outsourcing in an untrusted cloud by using a hybrid framework that combines secure hardware and multiple crypto protocols.
Collapse
Affiliation(s)
- Feng Chen
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA.
| | - Chenghong Wang
- Department of Computer Science, Syracuse University, Syracuse, 13244, NY, USA
| | - Wenrui Dai
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA
| | - Xiaoqian Jiang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA
| | - Noman Mohammed
- Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, MB, Canada
| | - Md Momin Al Aziz
- Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, MB, Canada
| | - Md Nazmus Sadat
- Department of Computer Science, University of Manitoba, Winnipeg, R3T 2N2, MB, Canada
| | - Cenk Sahinalp
- Department of Computer Science and Informatics, Indiana University, Bloomington, 47408, IN, USA
| | - Kristin Lauter
- Cryptography Group, Microsoft Research, San Diego,, 92122, CA, USA
| | - Shuang Wang
- Department of Biomedical Informatics, University of California San Diego, La Jolla, 92093, CA, USA
| |
Collapse
|
22
|
|