Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Prasser F, Kohlmayer F, Spengler H, Kuhn KA. A Scalable and Pragmatic Method for the Safe Sharing of High-Quality Health Data. IEEE J Biomed Health Inform 2017;22:611-622. [PMID: 28358693 DOI: 10.1109/jbhi.2017.2676880] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

For:	Prasser F, Kohlmayer F, Spengler H, Kuhn KA. A Scalable and Pragmatic Method for the Safe Sharing of High-Quality Health Data. IEEE J Biomed Health Inform 2017;22:611-622. [PMID: 28358693 DOI: 10.1109/jbhi.2017.2676880] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Number

Cited by Other Article(s)

Pilgram L, Meurers T, Malin B, Schaeffner E, Eckardt KU, Prasser F. The Costs of Anonymization: Case Study Using Clinical Data. J Med Internet Res 2024;26:e49445. [PMID: 38657232 PMCID: PMC11079766 DOI: 10.2196/49445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/14/2024] [Accepted: 02/13/2024] [Indexed: 04/26/2024] Open

Abstract

BACKGROUND

Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice.

OBJECTIVE

The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study.

METHODS

The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results.

RESULTS

Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy.

CONCLUSIONS

Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data.

TRIAL REGISTRATION

German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971.

INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID)

RR2-10.1093/ndt/gfr456.

Collapse

Prasser F, Riedel N, Wolter S, Corr D, Ludwig M. [Artificial intelligence and secure use of health data in the KI-FDZ project: anonymization, synthetization, and secure processing of real-world data]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2024;67:171-179. [PMID: 38175194 PMCID: PMC10834625 DOI: 10.1007/s00103-023-03823-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024]

Liu C, Talaei-Khoei A, Storey VC, Peng G. A Review of the State of the Art of Data Quality in Healthcare. JOURNAL OF GLOBAL INFORMATION MANAGEMENT 2023. [DOI: 10.4018/jgim.316236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

Lyons AM, Dimas J, Richardson SJ, Sward K. Assessing EHR Data for Use in Clinical Improvement and Research. Am J Nurs 2022;122:32-41. [PMID: 35551125 DOI: 10.1097/01.naj.0000832728.09164.3f] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Lee JS, Chew CJ, Liu JY, Chen YC, Tsai KY. Medical blockchain: Data sharing and privacy preserving of EHR based on smart contract. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2022. [DOI: 10.1016/j.jisa.2022.103117] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Ahn NY, Park JE, Lee DH, Hong PC. Balancing Personal Privacy and Public Safety During COVID-19: The Case of South Korea. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020;8:171325-171333. [PMID: 34786290 PMCID: PMC8545276 DOI: 10.1109/access.2020.3025971] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 09/20/2020] [Indexed: 05/09/2023]

Eicher J, Bild R, Spengler H, Kuhn KA, Prasser F. A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models. BMC Med Inform Decis Mak 2020;20:29. [PMID: 32046701 PMCID: PMC7014648 DOI: 10.1186/s12911-020-1041-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 01/30/2020] [Indexed: 02/07/2023] Open

Abstract

Background

Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap.

Results

We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques.

Conclusions

With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.

Collapse

Liu X, Ma W, Cao H. NPMA: A Novel Privacy-Preserving Mutual Authentication in TMIS for Mobile Edge-Cloud Architecture. J Med Syst 2019;43:318. [PMID: 31522286 DOI: 10.1007/s10916-019-1444-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 08/28/2019] [Indexed: 11/28/2022]

Platt J, Raj M, Büyüktür AG, Trinidad MG, Olopade O, Ackerman MS, Kardia S. Willingness to Participate in Health Information Networks with Diverse Data Use: Evaluating Public Perspectives. EGEMS (WASHINGTON, DC) 2019;7:33. [PMID: 31367650 PMCID: PMC6659576 DOI: 10.5334/egems.288] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2018] [Accepted: 05/16/2019] [Indexed: 11/20/2022]

Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review. J Med Internet Res 2019;21:e13484. [PMID: 31152528 PMCID: PMC6658290 DOI: 10.2196/13484] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/29/2019] [Accepted: 04/26/2019] [Indexed: 01/19/2023] Open

Abstract

Background

The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients’ privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects’ privacy on one side, and the benefit of scientific advances on the other.

Objective

This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers.

Methods

Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently.

Results

After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data.

Conclusions

Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.

Collapse

Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach. GRANULAR COMPUTING 2018. [DOI: 10.1007/s41066-018-0127-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Prasser F, Kohlbacher O, Mansmann U, Bauer B, Kuhn KA. Data Integration for Future Medicine (DIFUTURE). Methods Inf Med 2018;57:e57-e65. [PMID: 30016812 PMCID: PMC6178202 DOI: 10.3414/me17-02-0022] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Abstract

INTRODUCTION

This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers.

OBJECTIVES

The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing.

GOVERNANCE AND POLICIES

Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change.

ARCHITECTURAL FRAMEWORK AND METHODOLOGY

The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing.

USE CASES

From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios.

DISCUSSION

Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.

Collapse

Li W, Liu BM, Liu D, Liu RP, Wang P, Luo S, Ni W. Unified Fine-Grained Access Control for Personal Health Records in Cloud Computing. IEEE J Biomed Health Inform 2018;23:1278-1289. [PMID: 29994490 DOI: 10.1109/jbhi.2018.2850304] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]