1
|
Thomas M, Mackes N, Preuss-Dodhy A, Wieland T, Bundschus M. Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2024; 5:e54332. [PMID: 38935957 PMCID: PMC11165293 DOI: 10.2196/54332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 03/26/2024] [Accepted: 03/29/2024] [Indexed: 06/29/2024]
Abstract
BACKGROUND Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation. OBJECTIVE This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets. METHODS We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success. RESULTS From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants. CONCLUSIONS On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk.
Collapse
|
2
|
Brancato V, Esposito G, Coppola L, Cavaliere C, Mirabelli P, Scapicchio C, Borgheresi R, Neri E, Salvatore M, Aiello M. Standardizing digital biobanks: integrating imaging, genomic, and clinical data for precision medicine. J Transl Med 2024; 22:136. [PMID: 38317237 PMCID: PMC10845786 DOI: 10.1186/s12967-024-04891-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 01/14/2024] [Indexed: 02/07/2024] Open
Abstract
Advancements in data acquisition and computational methods are generating a large amount of heterogeneous biomedical data from diagnostic domains such as clinical imaging, pathology, and next-generation sequencing (NGS), which help characterize individual differences in patients. However, this information needs to be available and suitable to promote and support scientific research and technological development, supporting the effective adoption of the precision medicine approach in clinical practice. Digital biobanks can catalyze this process, facilitating the sharing of curated and standardized imaging data, clinical, pathological and molecular data, crucial to enable the development of a comprehensive and personalized data-driven diagnostic approach in disease management and fostering the development of computational predictive models. This work aims to frame this perspective, first by evaluating the state of standardization of individual diagnostic domains and then by identifying challenges and proposing a possible solution towards an integrative approach that can guarantee the suitability of information that can be shared through a digital biobank. Our analysis of the state of the art shows the presence and use of reference standards in biobanks and, generally, digital repositories for each specific domain. Despite this, standardization to guarantee the integration and reproducibility of the numerical descriptors generated by each domain, e.g. radiomic, pathomic and -omic features, is still an open challenge. Based on specific use cases and scenarios, an integration model, based on the JSON format, is proposed that can help address this problem. Ultimately, this work shows how, with specific standardization and promotion efforts, the digital biobank model can become an enabling technology for the comprehensive study of diseases and the effective development of data-driven technologies at the service of precision medicine.
Collapse
Affiliation(s)
| | - Giuseppina Esposito
- Bio Check Up S.R.L, 80121, Naples, Italy
- Department of Advanced Biomedical Sciences, University of Naples Federico II, 80131, Naples, Italy
| | | | | | - Peppino Mirabelli
- UOS Laboratori di Ricerca e Biobanca, AORN Santobono-Pausilipon, Via Teresa Ravaschieri, 8, 80122, Naples, Italy
| | - Camilla Scapicchio
- Academic Radiology, Department of Translational Research, University of Pisa, via Roma, 67, 56126, Pisa, Italy
| | - Rita Borgheresi
- Academic Radiology, Department of Translational Research, University of Pisa, via Roma, 67, 56126, Pisa, Italy
| | - Emanuele Neri
- Academic Radiology, Department of Translational Research, University of Pisa, via Roma, 67, 56126, Pisa, Italy
| | | | | |
Collapse
|
3
|
Al Aziz MM, Anjum MM, Mohammed N, Jiang X. Generalized Genomic Data Sharing for Differentially Private Federated Learning. J Biomed Inform 2022; 132:104113. [PMID: 35690350 DOI: 10.1016/j.jbi.2022.104113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Revised: 03/28/2022] [Accepted: 06/05/2022] [Indexed: 10/18/2022]
Abstract
The success behind Machine Learning (ML) methods has largely been attributed to the quality and quantity of the available data which can spread across multiple owners. A Federated Learning (FL) from distributed datasets often provides a reliable solution that provides valuable insight. For a genomic dataset, such data have also proven to be sensitive which requires additional safety mechanisms before any sharing or ML operations. We propose a generalized gene expression data sharing method using a differentially private mechanism. Due to the large number of genes available, the data dimension is also reduced to accommodate smaller privacy budgets as we utilize an exponential mechanism to create a private histogram from numeric expression data. The output histogram can be used in any federated machine learning setting having multiple data owners. The proposed solution was submitted to genomic data security and privacy competition, iDash 2020 where it ranked third among 55 teams. We extend the proposed solution and experimented with two different machine learning algorithms and different settings. The experimental results show that it takes around 8 seconds to train a model while achieving 0.89 AUC with only a privacy budget of 5. The paper outlined a method to share gene expression data for Federated Learning using a privacy-preserving mechanism. Different experimental settings and recent competition results show the efficacy of the method which can be further extended to other genomic datasets and machine learning algorithms.
Collapse
Affiliation(s)
- Md Momin Al Aziz
- Computer Science, University of Manitoba, 66 Chancellors Circle, Winnipeg, R3T 2N2, Manitoba, Canada
| | - Md Monowar Anjum
- Computer Science, University of Manitoba, 66 Chancellors Circle, Winnipeg, R3T 2N2, Manitoba, Canada
| | - Noman Mohammed
- Computer Science, University of Manitoba, 66 Chancellors Circle, Winnipeg, R3T 2N2, Manitoba, Canada
| | - Xiaoqian Jiang
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, 7000 Fannin Street, Houston, 77030, Texas, USA
| |
Collapse
|
4
|
Alsaffar MM, Hasan M, McStay GP, Sedky M. Digital DNA lifecycle security and privacy: an overview. Brief Bioinform 2022; 23:6518049. [PMID: 35106557 DOI: 10.1093/bib/bbab607] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Revised: 12/29/2021] [Accepted: 12/30/2021] [Indexed: 11/14/2022] Open
Abstract
DNA sequencing technologies have advanced significantly in the last few years leading to advancements in biomedical research which has improved personalised medicine and the discovery of new treatments for diseases. Sequencing technology advancement has also reduced the cost of DNA sequencing, which has led to the rise of direct-to-consumer (DTC) sequencing, e.g. 23andme.com, ancestry.co.uk, etc. In the meantime, concerns have emerged over privacy and security in collecting, handling, analysing and sharing DNA and genomic data. DNA data are unique and can be used to identify individuals. Moreover, those data provide information on people's current disease status and disposition, e.g. mental health or susceptibility for developing cancer. DNA privacy violation does not only affect the owner but also affects their close consanguinity due to its hereditary nature. This article introduces and defines the term 'digital DNA life cycle' and presents an overview of privacy and security threats and their mitigation techniques for predigital DNA and throughout the digital DNA life cycle. It covers DNA sequencing hardware, software and DNA sequence pipeline in addition to common privacy attacks and their countermeasures when DNA digital data are stored, queried or shared. Likewise, the article examines DTC genomic sequencing privacy and security.
Collapse
Affiliation(s)
- Muhalb M Alsaffar
- Department of Computing, AI and Robotics, School of Digital, Technologies and Arts, Staffordshire University, College Road, ST4 2DE, Staffordshire, United Kingdom
| | | | - Gavin P McStay
- Department of Biological Sciences, School of Health, Science and Wellbeing, Staffordshire University, College Road, Stoke-on-Trent, Staffordshire, ST4 2DE, United Kingdom
| | - Mohamed Sedky
- Department of Computing, AI and Robotics, School of Digital, Technologies and Arts, Staffordshire University, College Road, ST4 2DE, Staffordshire, United Kingdom
| |
Collapse
|
5
|
Torkzadehmahani R, Nasirigerdeh R, Blumenthal DB, Kacprowski T, List M, Matschinske J, Spaeth J, Wenke NK, Baumbach J. Privacy-Preserving Artificial Intelligence Techniques in Biomedicine. Methods Inf Med 2022; 61:e12-e27. [PMID: 35062032 PMCID: PMC9246509 DOI: 10.1055/s-0041-1740630] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Background
Artificial intelligence (AI) has been successfully applied in numerous scientific domains. In biomedicine, AI has already shown tremendous potential, e.g., in the interpretation of next-generation sequencing data and in the design of clinical decision support systems.
Objectives
However, training an AI model on sensitive data raises concerns about the privacy of individual participants. For example, summary statistics of a genome-wide association study can be used to determine the presence or absence of an individual in a given dataset. This considerable privacy risk has led to restrictions in accessing genomic and other biomedical data, which is detrimental for collaborative research and impedes scientific progress. Hence, there has been a substantial effort to develop AI methods that can learn from sensitive data while protecting individuals' privacy.
Method
This paper provides a structured overview of recent advances in privacy-preserving AI techniques in biomedicine. It places the most important state-of-the-art approaches within a unified taxonomy and discusses their strengths, limitations, and open problems.
Conclusion
As the most promising direction, we suggest combining federated machine learning as a more scalable approach with other additional privacy-preserving techniques. This would allow to merge the advantages to provide privacy guarantees in a distributed way for biomedical applications. Nonetheless, more research is necessary as hybrid approaches pose new challenges such as additional network or computation overhead.
Collapse
Affiliation(s)
- Reihaneh Torkzadehmahani
- Institute for Artificial Intelligence in Medicine and Healthcare, Technical University of Munich, Munich, Germany
| | - Reza Nasirigerdeh
- Institute for Artificial Intelligence in Medicine and Healthcare, Technical University of Munich, Munich, Germany.,Klinikum Rechts der Isar, Technical University of Munich, Munich, Germany
| | - David B Blumenthal
- Department of Artificial Intelligence in Biomedical Engineering (AIBE), Friedrich-Alexander University Erlangen-Nürnberg (FAU), Erlangen, Germany
| | - Tim Kacprowski
- Division of Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of TU Braunschweig and Medical School Hannover, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, Technical University of Munich, Munich, Germany
| | - Julian Matschinske
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Julian Spaeth
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Nina Kerstin Wenke
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Jan Baumbach
- E.U. Horizon2020 FeatureCloud Project Consortium.,Chair of Computational Systems Biology, University of Hamburg, Hamburg, Germany.,Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
6
|
Wan Z, Vorobeychik Y, Xia W, Liu Y, Wooders M, Guo J, Yin Z, Clayton EW, Kantarcioglu M, Malin BA. Using game theory to thwart multistage privacy intrusions when sharing data. SCIENCE ADVANCES 2021; 7:eabe9986. [PMID: 34890225 PMCID: PMC8664254 DOI: 10.1126/sciadv.abe9986] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 10/25/2021] [Indexed: 06/13/2023]
Abstract
Person-specific biomedical data are now widely collected, but its sharing raises privacy concerns, specifically about the re-identification of seemingly anonymous records. Formal re-identification risk assessment frameworks can inform decisions about whether and how to share data; current techniques, however, focus on scenarios where the data recipients use only one resource for re-identification purposes. This is a concern because recent attacks show that adversaries can access multiple resources, combining them in a stage-wise manner, to enhance the chance of an attack’s success. In this work, we represent a re-identification game using a two-player Stackelberg game of perfect information, which can be applied to assess risk, and suggest an optimal data sharing strategy based on a privacy-utility tradeoff. We report on experiments with large-scale genomic datasets to show that, using game theoretic models accounting for adversarial capabilities to launch multistage attacks, most data can be effectively shared with low re-identification risk.
Collapse
Affiliation(s)
- Zhiyu Wan
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Yevgeniy Vorobeychik
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Weiyi Xia
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Yongtai Liu
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
| | - Myrna Wooders
- Department of Economics, Vanderbilt University, Nashville, TN 37235, USA
| | - Jia Guo
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
| | - Zhijun Yin
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| | - Ellen Wright Clayton
- Center for Biomedical Ethics and Society, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- School of Law, Vanderbilt University, Nashville, TN 37203, USA
- Department of Pediatrics, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Murat Kantarcioglu
- Department of Computer Science, University of Texas at Dallas, Richardson, TX 75080, USA
- Institute for Quantitative Social Science, Harvard University, Cambridge, MA 02138, USA
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94720, USA
| | - Bradley A. Malin
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37212, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN 37203, USA
| |
Collapse
|
7
|
Ziegenhain C, Sandberg R. BAMboozle removes genetic variation from human sequence data for open data sharing. Nat Commun 2021; 12:6216. [PMID: 34711808 PMCID: PMC8553849 DOI: 10.1038/s41467-021-26152-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Accepted: 09/20/2021] [Indexed: 11/18/2022] Open
Abstract
The risks associated with re-identification of human genetic data are severely limiting open data sharing in life sciences, even in studies where donor-related genetic variant information is not of primary interest. Here, we developed BAMboozle, a versatile tool to eliminate critical types of sensitive genetic information in human sequence data by reverting aligned reads to the genome reference sequence. Applying BAMboozle to functional genomics data, such as single-cell RNA-seq (scRNA-seq) and scATAC-seq datasets, confirmed the removal of donor-related single nucleotide polymorphisms (SNPs) and indels in a manner that did not disclose the altered positions. Importantly, BAMboozle only removes the genetic sequence variants of the sample (i.e., donor) while preserving other important aspects of the raw sequence data. For example, BAMboozled scRNA-seq data contained accurate cell-type associated gene expression signatures, splice kinetic information, and can be used for methods benchmarking. Altogether, BAMboozle efficiently removes genetic variation in aligned sequence data, which represents a step forward towards open data sharing in many areas of genomics where the genetic variant information is not of primary interest.
Collapse
Affiliation(s)
- Christoph Ziegenhain
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden
| | - Rickard Sandberg
- Department of Cell and Molecular Biology, Karolinska Institute, Stockholm, Sweden.
| |
Collapse
|
8
|
Wirth FN, Meurers T, Johns M, Prasser F. Privacy-preserving data sharing infrastructures for medical research: systematization and comparison. BMC Med Inform Decis Mak 2021; 21:242. [PMID: 34384406 PMCID: PMC8359765 DOI: 10.1186/s12911-021-01602-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 07/31/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Data sharing is considered a crucial part of modern medical research. Unfortunately, despite its advantages, it often faces obstacles, especially data privacy challenges. As a result, various approaches and infrastructures have been developed that aim to ensure that patients and research participants remain anonymous when data is shared. However, privacy protection typically comes at a cost, e.g. restrictions regarding the types of analyses that can be performed on shared data. What is lacking is a systematization making the trade-offs taken by different approaches transparent. The aim of the work described in this paper was to develop a systematization for the degree of privacy protection provided and the trade-offs taken by different data sharing methods. Based on this contribution, we categorized popular data sharing approaches and identified research gaps by analyzing combinations of promising properties and features that are not yet supported by existing approaches. METHODS The systematization consists of different axes. Three axes relate to privacy protection aspects and were adopted from the popular Five Safes Framework: (1) safe data, addressing privacy at the input level, (2) safe settings, addressing privacy during shared processing, and (3) safe outputs, addressing privacy protection of analysis results. Three additional axes address the usefulness of approaches: (4) support for de-duplication, to enable the reconciliation of data belonging to the same individuals, (5) flexibility, to be able to adapt to different data analysis requirements, and (6) scalability, to maintain performance with increasing complexity of shared data or common analysis processes. RESULTS Using the systematization, we identified three different categories of approaches: distributed data analyses, which exchange anonymous aggregated data, secure multi-party computation protocols, which exchange encrypted data, and data enclaves, which store pooled individual-level data in secure environments for access for analysis purposes. We identified important research gaps, including a lack of approaches enabling the de-duplication of horizontally distributed data or providing a high degree of flexibility. CONCLUSIONS There are fundamental differences between different data sharing approaches and several gaps in their functionality that may be interesting to investigate in future work. Our systematization can make the properties of privacy-preserving data sharing infrastructures more transparent and support decision makers and regulatory authorities with a better understanding of the trade-offs taken.
Collapse
Affiliation(s)
- Felix Nikolaus Wirth
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany.
| | - Thierry Meurers
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - Marco Johns
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| | - Fabian Prasser
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Germany
| |
Collapse
|
9
|
Paige B, Bell J, Bellet A, Gascón A, Ezer D. Reconstructing Genotypes in Private Genomic Databases from Genetic Risk Scores. J Comput Biol 2021; 28:435-451. [PMID: 33400590 PMCID: PMC8165474 DOI: 10.1089/cmb.2020.0445] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Some organizations such as 23andMe and the UK Biobank have large genomic databases that they re-use for multiple different genome-wide association studies. Even research studies that compile smaller genomic databases often utilize these databases to investigate many related traits. It is common for the study to report a genetic risk score (GRS) model for each trait within the publication. Here, we show that under some circumstances, these GRS models can be used to recover the genetic variants of individuals in these genomic databases—a reconstruction attack. In particular, if two GRS models are trained by using a largely overlapping set of participants, it is often possible to determine the genotype for each of the individuals who were used to train one GRS model, but not the other. We demonstrate this theoretically and experimentally by analyzing the Cornell Dog Genome database. The accuracy of our reconstruction attack depends on how accurately we can estimate the rate of co-occurrence of pairs of single nucleotide polymorphisms within the private database, so if this aggregate information is ever released, it would drastically reduce the security of a private genomic database. Caution should be applied when using the same database for multiple analysis, especially when a small number of individuals are included or excluded from one part of the study.
Collapse
Affiliation(s)
- Brooks Paige
- The Alan Turing Institute, London, United Kingdom.,Department of Computer Science, University College London, London, United Kingdom
| | - James Bell
- The Alan Turing Institute, London, United Kingdom
| | - Aurélien Bellet
- Inria, Parc Scientifique de la Haute Borne Park Plaza, Villeneuve d'Ascq, France
| | - Adrià Gascón
- The Alan Turing Institute, London, United Kingdom.,University of Warwick, Coventry, United Kingdom
| | - Daphne Ezer
- The Alan Turing Institute, London, United Kingdom.,University of Warwick, Coventry, United Kingdom.,Department of Biology, University of York, York, United Kingdom
| |
Collapse
|
10
|
Bonomi L, Huang Y, Ohno-Machado L. Privacy challenges and research opportunities for genomic data sharing. Nat Genet 2020; 52:646-654. [PMID: 32601475 PMCID: PMC7761157 DOI: 10.1038/s41588-020-0651-0] [Citation(s) in RCA: 70] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 05/22/2020] [Indexed: 12/17/2022]
Abstract
The sharing of genomic data holds great promise in advancing precision medicine and providing personalized treatments and other types of interventions. However, these opportunities come with privacy concerns, and data misuse could potentially lead to privacy infringement for individuals and their blood relatives. With the rapid growth and increased availability of genomic datasets, understanding the current genome privacy landscape and identifying the challenges in developing effective privacy-protecting solutions are imperative. In this work, we provide an overview of major privacy threats identified by the research community and examine the privacy challenges in the context of emerging direct-to-consumer genetic-testing applications. We additionally present general privacy-protection techniques for genomic data sharing and their potential applications in direct-to-consumer genomic testing and forensic analyses. Finally, we discuss limitations in current privacy-protection methods, highlight possible mitigation strategies and suggest future research opportunities for advancing genomic data sharing.
Collapse
Affiliation(s)
- Luca Bonomi
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA.
| | - Yingxiang Huang
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
| | - Lucila Ohno-Machado
- UCSD Health Department of Biomedical Informatics, University of California San Diego, La Jolla, CA, USA
- Division of Health Services Research & Development, VA San Diego Healthcare System, San Diego, La Jolla, CA, USA
| |
Collapse
|
11
|
Governance and Assessment of Future Spaces: A Discussion of Some Issues Raised by the Possibilities of Human–Machine Mergers. Development 2019. [DOI: 10.1057/s41301-019-00208-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
|
12
|
Mohammed Yakubu A, Chen YPP. Ensuring privacy and security of genomic data and functionalities. Brief Bioinform 2019; 21:511-526. [DOI: 10.1093/bib/bbz013] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 01/01/2019] [Accepted: 01/14/2019] [Indexed: 12/14/2022] Open
Abstract
Abstract
In recent times, the reduced cost of DNA sequencing has resulted in a plethora of genomic data that is being used to advance biomedical research and improve clinical procedures and healthcare delivery. These advances are revolutionizing areas in genome-wide association studies (GWASs), diagnostic testing, personalized medicine and drug discovery. This, however, comes with security and privacy challenges as the human genome is sensitive in nature and uniquely identifies an individual. In this article, we discuss the genome privacy problem and review relevant privacy attacks, classified into identity tracing, attribute disclosure and completion attacks, which have been used to breach the privacy of an individual. We then classify state-of-the-art genomic privacy-preserving solutions based on their application and computational domains (genomic aggregation, GWASs and statistical analysis, sequence comparison and genetic testing) that have been proposed to mitigate these attacks and compare them in terms of their underlining cryptographic primitives, security goals and complexities—computation and transmission overheads. Finally, we identify and discuss the open issues, research challenges and future directions in the field of genomic privacy. We believe this article will provide researchers with the current trends and insights on the importance and challenges of privacy and security issues in the area of genomics.
Collapse
Affiliation(s)
- Abukari Mohammed Yakubu
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC, Australia
| | - Yi-Ping Phoebe Chen
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC, Australia
| |
Collapse
|