1
|
Pilgram L, Meurers T, Malin B, Schaeffner E, Eckardt KU, Prasser F. The Costs of Anonymization: Case Study Using Clinical Data. J Med Internet Res 2024; 26:e49445. [PMID: 38657232 PMCID: PMC11079766 DOI: 10.2196/49445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 01/14/2024] [Accepted: 02/13/2024] [Indexed: 04/26/2024] Open
Abstract
BACKGROUND Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set's statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. OBJECTIVE The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. METHODS The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case-specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case-specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. RESULTS Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case-specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. CONCLUSIONS Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case-specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. TRIAL REGISTRATION German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID) RR2-10.1093/ndt/gfr456.
Collapse
Affiliation(s)
- Lisa Pilgram
- Junior Digital Clinician Scientist Program, Biomedical Innovation Academy, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Nephrology and Medical Intensive Care, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Thierry Meurers
- Medical Informatics Group, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Bradley Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Elke Schaeffner
- Institute of Public Health, Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Kai-Uwe Eckardt
- Department of Nephrology and Medical Intensive Care, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Department of Nephrology and Hypertension, Universitätsklinikum Erlangen, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
| | - Fabian Prasser
- Medical Informatics Group, Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
2
|
Prasser F, Riedel N, Wolter S, Corr D, Ludwig M. [Artificial intelligence and secure use of health data in the KI-FDZ project: anonymization, synthetization, and secure processing of real-world data]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2024; 67:171-179. [PMID: 38175194 PMCID: PMC10834625 DOI: 10.1007/s00103-023-03823-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 12/08/2023] [Indexed: 01/05/2024]
Abstract
The increasing digitization of the healthcare system is leading to a growing volume of health data. Leveraging this data beyond its initial collection purpose for secondary use can provide valuable insights into diagnostics, treatment processes, and the quality of care. The Health Data Lab (HDL) will provide infrastructure for this purpose. Both the protection of patient privacy and optimal analytical capabilities are of central importance in this context, and artificial intelligence (AI) provides two opportunities. First, it enables the analysis of large volumes of data with flexible models, which means that hidden correlations and patterns can be discovered. Second, synthetic - that is, artificial - data generated by AI can protect privacy.This paper describes the KI-FDZ project, which aims to investigate innovative technologies that can support the secure provision of health data for secondary research purposes. A multi-layered approach is investigated in which data-level measures can be combined in different ways with processing in secure environments. To this end, anonymization and synthetization methods, among others, are evaluated based on two concrete application examples. Moreover, it is examined how the creation of machine learning pipelines and the execution of AI algorithms can be supported in secure processing environments. Preliminary results indicate that this approach can achieve a high level of protection while maintaining data validity. The approach investigated in the project can be an important building block in the secure secondary use of health data.
Collapse
Affiliation(s)
- Fabian Prasser
- Center für Health Data Science, Berlin Institute of Health der Charité - Universitätsmedizin Berlin, Charitéplatz 1, 10117, Berlin, Deutschland.
| | - Nico Riedel
- Forschungsdatenzentrum Gesundheit, Bundesinstitut für Arzneimittel und Medizinprodukte (BfArM), Bonn, Deutschland
| | - Steven Wolter
- Forschungsdatenzentrum Gesundheit, Bundesinstitut für Arzneimittel und Medizinprodukte (BfArM), Bonn, Deutschland
| | - Dörte Corr
- Fraunhofer-Institut für Digitale Medizin MEVIS, Bremen, Deutschland
| | - Marion Ludwig
- InGef - Institut für angewandte Gesundheitsforschung Berlin GmbH, Berlin, Deutschland
| |
Collapse
|
3
|
Liu C, Talaei-Khoei A, Storey VC, Peng G. A Review of the State of the Art of Data Quality in Healthcare. JOURNAL OF GLOBAL INFORMATION MANAGEMENT 2023. [DOI: 10.4018/jgim.316236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Effective implementation of strategic data-driven health analysis initiatives is heavily dependent on the quality of the electronic medical records that serve as the foundation from which to improve clinical decisions and, in turn, the quality of care. Although there is a large body of research on the quality of healthcare data, a systematical understanding of the methods used to address the issues of data quality is missing. This study analyzes research articles in health information systems/healthcare informatics on data quality to derive a set of dimensions for understanding data quality. Issues related to each dimension are identified and methods used to address them summarized. The issues and methods can inform healthcare professionals of how to improve data practices.
Collapse
Affiliation(s)
- Caihua Liu
- Guilin University of Electronic Technology, China
| | | | | | | |
Collapse
|
4
|
Abstract
ABSTRACT Data from electronic health records (EHRs) are becoming accessible for use in clinical improvement projects and nursing research. But the data quality may not meet clinicians' and researchers' needs. EHR data, which are primarily collected to document clinical care, invariably contain errors and omissions. This article introduces nurses to the secondary analysis of EHR data, first outlining the steps in data acquisition and then describing a theory-based process for evaluating data quality and cleaning the data. This process involves methodically examining the data using six data quality dimensions-completeness, correctness, concordance, plausibility, currency, and relevance-and helps the clinician or researcher to determine whether data for each variable are fit for use. Two case studies offer examples of problems that can arise and their solutions.
Collapse
Affiliation(s)
- Ann M Lyons
- Ann M. Lyons is a medical informaticist at the University of Utah, Salt Lake City. Jonathan Dimas is the global medical affairs scientist at bioMérieux in Salt Lake City. Stephanie J. Richardson is retired from faculty and administrative positions at both the University of Utah College of Nursing and the Rocky Mountain University of Health Professions, Provo, UT. Katherine Sward is a professor of nursing in the University of Utah College of Nursing as well as an adjunct professor of biomedical informatics in the School of Medicine. Contact author: Ann M. Lyons, . The authors have disclosed no potential conflicts of interest, financial or otherwise
| | | | | | | |
Collapse
|
5
|
Lee JS, Chew CJ, Liu JY, Chen YC, Tsai KY. Medical blockchain: Data sharing and privacy preserving of EHR based on smart contract. JOURNAL OF INFORMATION SECURITY AND APPLICATIONS 2022. [DOI: 10.1016/j.jisa.2022.103117] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
6
|
Ahn NY, Park JE, Lee DH, Hong PC. Balancing Personal Privacy and Public Safety During COVID-19: The Case of South Korea. IEEE ACCESS : PRACTICAL INNOVATIONS, OPEN SOLUTIONS 2020; 8:171325-171333. [PMID: 34786290 PMCID: PMC8545276 DOI: 10.1109/access.2020.3025971] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 09/20/2020] [Indexed: 05/09/2023]
Abstract
There has been vigorous debate on how different countries responded to the COVID-19 pandemic. To secure public safety, South Korea actively used personal information at the risk of personal privacy whereas France encouraged voluntary cooperation at the risk of public safety. In this article, after a brief comparison of contextual differences with France, we focus on South Korea's approaches to epidemiological investigations. To evaluate the issues pertaining to personal privacy and public health, we examine the usage patterns of original data, de-identification data, and encrypted data. Our specific proposal discusses the COVID index, which considers collective infection, outbreak intensity, availability of medical infrastructure, and the death rate. Finally, we summarize the findings and lessons for future research and the policy implications.
Collapse
Affiliation(s)
- Na Young Ahn
- Institute of Cyber Security and Privacy, Korea UniversitySeoul02841South Korea
| | - Jun Eun Park
- Department of PediatricsKorea University College of MedicineSeoul02842South Korea
| | - Dong Hoon Lee
- Institute of Cyber Security and Privacy and The Graduate School of Information Security, Korea UniversitySeoul02841South Korea
| | - Paul C. Hong
- Information, Operations, and Technology Management College of Business and InnovationThe University of ToledoToledoOH43606USA
| |
Collapse
|
7
|
Eicher J, Bild R, Spengler H, Kuhn KA, Prasser F. A comprehensive tool for creating and evaluating privacy-preserving biomedical prediction models. BMC Med Inform Decis Mak 2020; 20:29. [PMID: 32046701 PMCID: PMC7014648 DOI: 10.1186/s12911-020-1041-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 01/30/2020] [Indexed: 02/07/2023] Open
Abstract
Background Modern data driven medical research promises to provide new insights into the development and course of disease and to enable novel methods of clinical decision support. To realize this, machine learning models can be trained to make predictions from clinical, paraclinical and biomolecular data. In this process, privacy protection and regulatory requirements need careful consideration, as the resulting models may leak sensitive personal information. To counter this threat, a wide range of methods for integrating machine learning with formal methods of privacy protection have been proposed. However, there is a significant lack of practical tools to create and evaluate such privacy-preserving models. In this software article, we report on our ongoing efforts to bridge this gap. Results We have extended the well-known ARX anonymization tool for biomedical data with machine learning techniques to support the creation of privacy-preserving prediction models. Our methods are particularly well suited for applications in biomedicine, as they preserve the truthfulness of data (e.g. no noise is added) and they are intuitive and relatively easy to explain to non-experts. Moreover, our implementation is highly versatile, as it supports binomial and multinomial target variables, different types of prediction models and a wide range of privacy protection techniques. All methods have been integrated into a sound framework that supports the creation, evaluation and refinement of models through intuitive graphical user interfaces. To demonstrate the broad applicability of our solution, we present three case studies in which we created and evaluated different types of privacy-preserving prediction models for breast cancer diagnosis, diagnosis of acute inflammation of the urinary system and prediction of the contraceptive method used by women. In this process, we also used a wide range of different privacy models (k-anonymity, differential privacy and a game-theoretic approach) as well as different data transformation techniques. Conclusions With the tool presented in this article, accurate prediction models can be created that preserve the privacy of individuals represented in the training set in a variety of threat scenarios. Our implementation is available as open source software.
Collapse
Affiliation(s)
- Johanna Eicher
- School of Medicine, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany.
| | - Raffael Bild
- School of Medicine, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany
| | - Helmut Spengler
- School of Medicine, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany
| | - Klaus A Kuhn
- School of Medicine, Technical University of Munich, Ismaninger Str. 22, Munich, 81675, Germany
| | - Fabian Prasser
- Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Straße 2, Berlin, 10178, Germany.,Charité - Universitätsmedizin Berlin, Charitéplatz 1, Berlin, 10117, Germany
| |
Collapse
|
8
|
Liu X, Ma W, Cao H. NPMA: A Novel Privacy-Preserving Mutual Authentication in TMIS for Mobile Edge-Cloud Architecture. J Med Syst 2019; 43:318. [PMID: 31522286 DOI: 10.1007/s10916-019-1444-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2019] [Accepted: 08/28/2019] [Indexed: 11/28/2022]
Abstract
Mobile Edge-Cloud Network is a new network structure after fog-cloud computing, where service and data computing are scattered in the most logical, nearby and efficient place. It provides better services than fog-cloud computing with better performance in reasonably low cost way and allows users to eliminate numerous limitations inherent in fog-cloud computing, although it inherits those security-privacy issues from fog-cloud computing. A novel privacy-preserving mutual authentication in TMIS for mobile Edge-Cloud architecture (abbreviated to NPMA) is constructed in this paper. NPMA scheme not only mitigates some weaknesses of fog-cloud computing, but has other advantages. First, NPMA scheme supports patients(edge-servers) anonymity and forward-backward untraceability (traceability, when needed), since their identities are hidden in two distinct dynamic anonyms and a static one and only the trusted center can recover their real identities, when needed. Second, each edge-server shares a secret value, which realizes authentication with extremely low computional cost in authentication phase. Finally, NPMA scheme is proven safely against passive and active attacks under elliptic curve computable Diffie-Hellman problem (ECDHP) assumption in random oracle model. Hence, it achieves the required security properties and outperforms prior approaches in terms of energy and computational costs.
Collapse
Affiliation(s)
- Xiaoxue Liu
- Xidian University, No.2, South Taibai Road, Yanta District, Xi'an, China.
| | - Wenping Ma
- Xidian University, No.2, South Taibai Road, Yanta District, Xi'an, China
| | - Hao Cao
- Xidian University, No.2, South Taibai Road, Yanta District, Xi'an, China.,Anhui Science and Technology University, Chuzhou, 233100, China
| |
Collapse
|
9
|
Platt J, Raj M, Büyüktür AG, Trinidad MG, Olopade O, Ackerman MS, Kardia S. Willingness to Participate in Health Information Networks with Diverse Data Use: Evaluating Public Perspectives. EGEMS (WASHINGTON, DC) 2019; 7:33. [PMID: 31367650 PMCID: PMC6659576 DOI: 10.5334/egems.288] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/22/2018] [Accepted: 05/16/2019] [Indexed: 11/20/2022]
Abstract
INTRODUCTION Health information generated by health care encounters, research enterprises, and public health is increasingly interoperable and shareable across uses and users. This paper examines the US public's willingness to be a part of multi-user health information networks and identifies factors associated with that willingness. METHODS Using a probability-based sample (n = 890), we examined the univariable and multivariable relationships between willingness to participate in health information networks and demographic factors, trust, altruism, beliefs about the public's ethical obligation to participate in research, privacy, medical deception, and policy and governance using linear regression modeling. RESULTS Willingness to be a part of a multi-user network that includes health care providers, mental health, social services, research, or quality improvement is low (26 percent-7.4 percent, depending on the user). Using stepwise regression, we identified a model that explained 42.6 percent of the variability in willingness to participate and included nine statistically significant factors associated with the outcome: Trust in the health system, confidence in policy, the belief that people have an obligation to participate in research, the belief that health researchers are accountable for conducting ethical research, the desire to give permission, education, concerns about insurance, privacy, and preference for notification. DISCUSSION Our results suggest willingness to be a part of multi-user data networks is low, but that attention to governance may increase willingness. Building trust to enable acceptance of multi-use data networks will require a commitment to aligning data access practices with the expectations of the people whose data is being used.
Collapse
Affiliation(s)
- Jodyn Platt
- University of Michigan Medical School, Department of Learning Health Sciences, US
| | - Minakshi Raj
- University of Michigan School of Public Health, Department of Health Management and Policy, US
| | - Ayşe G. Büyüktür
- University of Michigan School of Information and Michigan Institute for Clinical and Health Research, US
| | - M. Grace Trinidad
- University of Michigan Medical School, Department of Learning Health Sciences, US
| | | | - Mark S. Ackerman
- University of Michigan School of Information, College of Engineering, EECS, and Medical School, Department of Learning Health Systems, US
| | - Sharon Kardia
- University of Michigan School of Public Health, Department of Epidemiology, US
| |
Collapse
|
10
|
Chevrier R, Foufi V, Gaudet-Blavignac C, Robert A, Lovis C. Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review. J Med Internet Res 2019; 21:e13484. [PMID: 31152528 PMCID: PMC6658290 DOI: 10.2196/13484] [Citation(s) in RCA: 42] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/29/2019] [Accepted: 04/26/2019] [Indexed: 01/19/2023] Open
Abstract
Background The secondary use of health data is central to biomedical research in the era of data science and precision medicine. National and international initiatives, such as the Global Open Findable, Accessible, Interoperable, and Reusable (GO FAIR) initiative, are supporting this approach in different ways (eg, making the sharing of research data mandatory or improving the legal and ethical frameworks). Preserving patients’ privacy is crucial in this context. De-identification and anonymization are the two most common terms used to refer to the technical approaches that protect privacy and facilitate the secondary use of health data. However, it is difficult to find a consensus on the definitions of the concepts or on the reliability of the techniques used to apply them. A comprehensive review is needed to better understand the domain, its capabilities, its challenges, and the ratio of risk between the data subjects’ privacy on one side, and the benefit of scientific advances on the other. Objective This work aims at better understanding how the research community comprehends and defines the concepts of de-identification and anonymization. A rich overview should also provide insights into the use and reliability of the methods. Six aspects will be studied: (1) terminology and definitions, (2) backgrounds and places of work of the researchers, (3) reasons for anonymizing or de-identifying health data, (4) limitations of the techniques, (5) legal and ethical aspects, and (6) recommendations of the researchers. Methods Based on a scoping review protocol designed a priori, MEDLINE was searched for publications discussing de-identification or anonymization and published between 2007 and 2017. The search was restricted to MEDLINE to focus on the life sciences community. The screening process was performed by two reviewers independently. Results After searching 7972 records that matched at least one search term, 135 publications were screened and 60 full-text articles were included. (1) Terminology: Definitions of the terms de-identification and anonymization were provided in less than half of the articles (29/60, 48%). When both terms were used (41/60, 68%), their meanings divided the authors into two equal groups (19/60, 32%, each) with opposed views. The remaining articles (3/60, 5%) were equivocal. (2) Backgrounds and locations: Research groups were based predominantly in North America (31/60, 52%) and in the European Union (22/60, 37%). The authors came from 19 different domains; computer science (91/248, 36.7%), biomedical informatics (47/248, 19.0%), and medicine (38/248, 15.3%) were the most prevalent ones. (3) Purpose: The main reason declared for applying these techniques is to facilitate biomedical research. (4) Limitations: Progress is made on specific techniques but, overall, limitations remain numerous. (5) Legal and ethical aspects: Differences exist between nations in the definitions, approaches, and legal practices. (6) Recommendations: The combination of organizational, legal, ethical, and technical approaches is necessary to protect health data. Conclusions Interest is growing for privacy-enhancing techniques in the life sciences community. This interest crosses scientific boundaries, involving primarily computer science, biomedical informatics, and medicine. The variability observed in the use of the terms de-identification and anonymization emphasizes the need for clearer definitions as well as for better education and dissemination of information on the subject. The same observation applies to the methods. Several legislations, such as the American Health Insurance Portability and Accountability Act (HIPAA) and the European General Data Protection Regulation (GDPR), regulate the domain. Using the definitions they provide could help address the variable use of these two concepts in the research community.
Collapse
Affiliation(s)
- Raphaël Chevrier
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Vasiliki Foufi
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Christophe Gaudet-Blavignac
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Arnaud Robert
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Christian Lovis
- Division of Medical Information Sciences, University Hospitals of Geneva, Geneva, Switzerland.,Faculty of Medicine, University of Geneva, Geneva, Switzerland
| |
Collapse
|
11
|
Learning quasi-identifiers for privacy-preserving exchanges: a rough set theory approach. GRANULAR COMPUTING 2018. [DOI: 10.1007/s41066-018-0127-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
12
|
Abstract
INTRODUCTION This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers. OBJECTIVES The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing. GOVERNANCE AND POLICIES Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change. ARCHITECTURAL FRAMEWORK AND METHODOLOGY The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing. USE CASES From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios. DISCUSSION Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.
Collapse
Affiliation(s)
- Fabian Prasser
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
- Correspondence to: Dr. Fabian Prasser Institute of Medical InformaticsStatistics and EpidemiologyUniversity Hospital rechts der IsarTechnical University of MunichIsmaninger Straße 2281675 MunichGermany
| | - Oliver Kohlbacher
- Department of Computer Science, Center for Bioinformatics and Quantitative Biology Center, Eberhard-Karls-Universität Tübingen, Tübingen, Germany
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Bernhard Bauer
- Department of Computer Science, University of Augsburg, Augsburg, Germany
| | - Klaus A. Kuhn
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
| |
Collapse
|
13
|
Li W, Liu BM, Liu D, Liu RP, Wang P, Luo S, Ni W. Unified Fine-Grained Access Control for Personal Health Records in Cloud Computing. IEEE J Biomed Health Inform 2018; 23:1278-1289. [PMID: 29994490 DOI: 10.1109/jbhi.2018.2850304] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Attribute-based encryption has been a promising encryption technology to secure personal health records (PHRs) sharing in cloud computing. PHRs consist of the patient data often collected from various sources including hospitals and general practice centres. Different patients' access policies have a common access sub-policy. In this paper, we propose a novel attribute-based encryption scheme for fine-grained and flexible access control to PHRs data in cloud computing. The scheme generates shared information by the common access sub-policy, which is based on different patients' access policies. Then, the scheme combines the encryption of PHRs from different patients. Therefore, both time consumption of encryption and decryption can be reduced. Medical staff require varying levels of access to PHRs. The proposed scheme can also support multi-privilege access control so that medical staff can access the required level of information while maximizing patient privacy. Through implementation and simulation, we demonstrate that the proposed scheme is efficient in terms of time. Moreover, we prove the security of the proposed scheme based on security of the ciphertext-policy attribute-based encryption scheme.
Collapse
|