1
|
Abu Attieh H, Müller A, Wirth FN, Prasser F. Pseudonymization tools for medical research: a systematic review. BMC Med Inform Decis Mak 2025; 25:128. [PMID: 40075358 PMCID: PMC11905493 DOI: 10.1186/s12911-025-02958-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Accepted: 03/03/2025] [Indexed: 03/14/2025] Open
Abstract
BACKGROUND Pseudonymization is an important technique for the secure and compliant use of medical data in research. At its core, pseudonymization is a process in which directly identifying information is separated from medical research data. Due to its importance, a wide range of pseudonymization tools and services have been developed, and researchers face the challenge of selecting an appropriate tool for their research projects. This review aims to address this challenge by systematically comparing existing tools. METHODS A systematic review was performed and is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines where applicable. The search covered PubMed and Web of Science to identify pseudonymization tools documented in the scientific literature. The tools were assessed based on predefined criteria across four key dimensions that describe researchers' requirements: (1) single-center vs. multi-center use, (2) short-term vs. long-term projects, (3) small data vs. big data processing, and (4) integration vs. standalone functionality. RESULTS From an initial pool of 1,052 papers, 92 were selected for detailed full-text review after the title and abstract screening. This led to the identification of 20 pseudonymization tools, of which 10 met our inclusion criteria and were assessed. The results show that there are differences between the tools that make them more or less suited for research projects differing in regards to the dimensions described above, enabling us to provide targeted recommendations. CONCLUSIONS The landscape of existing pseudonymization tools is heterogeneous, and researchers need to carefully select the appropriate solutions for their research projects. Our findings highlight two Software-as-a-Service-based solutions that enable centralized use without local infrastructure, one tool for retrospective pseudonymization of existing databases, two tools suitable for local deployment in smaller, short-term projects, and two tools well-suited for local deployment in large, multi-center studies.
Collapse
Affiliation(s)
- Hammam Abu Attieh
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117, Berlin, Germany.
| | - Armin Müller
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117, Berlin, Germany
| | - Felix Nikolaus Wirth
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117, Berlin, Germany
| | - Fabian Prasser
- Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Medical Informatics Group, Charitéplatz 1, 10117, Berlin, Germany
| |
Collapse
|
2
|
Tarride JE, Okoh A, Aryal K, Prada C, Milinkovic D, Keepanasseril A, Iorio A. Scoping review of the recommendations and guidance for improving the quality of rare disease registries. Orphanet J Rare Dis 2024; 19:187. [PMID: 38711103 PMCID: PMC11075280 DOI: 10.1186/s13023-024-03193-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 04/19/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND Rare disease registries (RDRs) are valuable tools for improving clinical care and advancing research. However, they often vary qualitatively, structurally, and operationally in ways that can determine their potential utility as a source of evidence to support decision-making regarding the approval and funding of new treatments for rare diseases. OBJECTIVES The goal of this research project was to review the literature on rare disease registries and identify best practices to improve the quality of RDRs. METHODS In this scoping review, we searched MEDLINE and EMBASE as well as the websites of regulatory bodies and health technology assessment agencies from 2010 to April 2023 for literature offering guidance or recommendations to ensure, improve, or maintain quality RDRs. RESULTS The search yielded 1,175 unique references, of which 64 met the inclusion criteria. The characteristics of RDRs deemed to be relevant to their quality align with three main domains and several sub-domains considered to be best practices for quality RDRs: (1) governance (registry purpose and description; governance structure; stakeholder engagement; sustainability; ethics/legal/privacy; data governance; documentation; and training and support); (2) data (standardized disease classification; common data elements; data dictionary; data collection; data quality and assurance; and data analysis and reporting); and (3) information technology (IT) infrastructure (physical and virtual infrastructure; and software infrastructure guided by FAIR principles (Findability; Accessibility; Interoperability; and Reusability). CONCLUSIONS Although RDRs face numerous challenges due to their small and dispersed populations, RDRs can generate quality data to support healthcare decision-making through the use of standards and principles on strong governance, quality data practices, and IT infrastructure.
Collapse
Affiliation(s)
- J E Tarride
- Department of Health Research Methods, Evidence and Impact, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Centre for Health Economics and Policy Analysis (CHEPA), McMaster University, Hamilton, Canada
- Programs for the Assessment of Technologies in Health (PATH), The Research Institute of St. Joe's Hamilton, St. Joseph's Healthcare Hamilton, Hamilton, ON, Canada
| | - A Okoh
- Department of Health Research Methods, Evidence and Impact, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - K Aryal
- Department of Health Research Methods, Evidence and Impact, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - C Prada
- Department of Health Research Methods, Evidence and Impact, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Deborah Milinkovic
- Centre for Health Economics and Policy Analysis (CHEPA), McMaster University, Hamilton, Canada.
| | - A Keepanasseril
- Department of Health Research Methods, Evidence and Impact, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - A Iorio
- Department of Health Research Methods, Evidence and Impact, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| |
Collapse
|
3
|
Abu Attieh H, Neves DT, Guedes M, Mirandola M, Dellacasa C, Rossi E, Prasser F. A Scalable Pseudonymization Tool for Rapid Deployment in Large Biomedical Research Networks: Development and Evaluation Study. JMIR Med Inform 2024; 12:e49646. [PMID: 38654577 PMCID: PMC11063579 DOI: 10.2196/49646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/03/2023] [Accepted: 03/07/2024] [Indexed: 04/26/2024] Open
Abstract
Background The SARS-CoV-2 pandemic has demonstrated once again that rapid collaborative research is essential for the future of biomedicine. Large research networks are needed to collect, share, and reuse data and biosamples to generate collaborative evidence. However, setting up such networks is often complex and time-consuming, as common tools and policies are needed to ensure interoperability and the required flows of data and samples, especially for handling personal data and the associated data protection issues. In biomedical research, pseudonymization detaches directly identifying details from biomedical data and biosamples and connects them using secure identifiers, the so-called pseudonyms. This protects privacy by design but allows the necessary linkage and reidentification. Objective Although pseudonymization is used in almost every biomedical study, there are currently no pseudonymization tools that can be rapidly deployed across many institutions. Moreover, using centralized services is often not possible, for example, when data are reused and consent for this type of data processing is lacking. We present the ORCHESTRA Pseudonymization Tool (OPT), developed under the umbrella of the ORCHESTRA consortium, which faced exactly these challenges when it came to rapidly establishing a large-scale research network in the context of the rapid pandemic response in Europe. Methods To overcome challenges caused by the heterogeneity of IT infrastructures across institutions, the OPT was developed based on programmable runtime environments available at practically every institution: office suites. The software is highly configurable and provides many features, from subject and biosample registration to record linkage and the printing of machine-readable codes for labeling biosample tubes. Special care has been taken to ensure that the algorithms implemented are efficient so that the OPT can be used to pseudonymize large data sets, which we demonstrate through a comprehensive evaluation. Results The OPT is available for Microsoft Office and LibreOffice, so it can be deployed on Windows, Linux, and MacOS. It provides multiuser support and is configurable to meet the needs of different types of research projects. Within the ORCHESTRA research network, the OPT has been successfully deployed at 13 institutions in 11 countries in Europe and beyond. As of June 2023, the software manages data about more than 30,000 subjects and 15,000 biosamples. Over 10,000 labels have been printed. The results of our experimental evaluation show that the OPT offers practical response times for all major functionalities, pseudonymizing 100,000 subjects in 10 seconds using Microsoft Excel and in 54 seconds using LibreOffice. Conclusions Innovative solutions are needed to make the process of establishing large research networks more efficient. The OPT, which leverages the runtime environment of common office suites, can be used to rapidly deploy pseudonymization and biosample management capabilities across research networks. The tool is highly configurable and available as open-source software.
Collapse
Affiliation(s)
- Hammam Abu Attieh
- Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Diogo Telmo Neves
- Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Mariana Guedes
- Infection and Antimicrobial Resistance Control and Prevention Unit, Centro Hospitalar Universitário São João, Porto, Portugal
- Infectious Diseases and Microbiology Division, Hospital Universitario Virgen Macarena, Sevilla, Spain
- Department of Medicine, University of Sevilla/Instituto de Biomedicina de Sevilla (IBiS)/Consejo Superior de Investigaciones Científicas (CSIC), Sevilla, Spain
| | - Massimo Mirandola
- Infectious Diseases Division, Diagnostic and Public Health Department, University of Verona, Verona, Italy
| | - Chiara Dellacasa
- High Performance Computing (HPC) Department, CINECA - Consorzio Interuniversitario, Bologna, Italy
| | - Elisa Rossi
- High Performance Computing (HPC) Department, CINECA - Consorzio Interuniversitario, Bologna, Italy
| | - Fabian Prasser
- Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
4
|
Wündisch E, Hufnagl P, Brunecker P, Meier zu Ummeln S, Träger S, Kopp M, Prasser F, Weber J. Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study. JMIR Med Inform 2024; 12:e53075. [PMID: 38632712 PMCID: PMC11040164 DOI: 10.2196/53075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/15/2024] [Accepted: 02/17/2024] [Indexed: 04/19/2024] Open
Abstract
Background Pseudonymization has become a best practice to securely manage the identities of patients and study participants in medical research projects and data sharing initiatives. This method offers the advantage of not requiring the direct identification of data to support various research processes while still allowing for advanced processing activities, such as data linkage. Often, pseudonymization and related functionalities are bundled in specific technical and organization units known as trusted third parties (TTPs). However, pseudonymization can significantly increase the complexity of data management and research workflows, necessitating adequate tool support. Common tasks of TTPs include supporting the secure registration and pseudonymization of patient and sample identities as well as managing consent. Objective Despite the challenges involved, little has been published about successful architectures and functional tools for implementing TTPs in large university hospitals. The aim of this paper is to fill this research gap by describing the software architecture and tool set developed and deployed as part of a TTP established at Charité - Universitätsmedizin Berlin. Methods The infrastructure for the TTP was designed to provide a modular structure while keeping maintenance requirements low. Basic functionalities were realized with the free MOSAIC tools. However, supporting common study processes requires implementing workflows that span different basic services, such as patient registration, followed by pseudonym generation and concluded by consent collection. To achieve this, an integration layer was developed to provide a unified Representational state transfer (REST) application programming interface (API) as a basis for more complex workflows. Based on this API, a unified graphical user interface was also implemented, providing an integrated view of information objects and workflows supported by the TTP. The API was implemented using Java and Spring Boot, while the graphical user interface was implemented in PHP and Laravel. Both services use a shared Keycloak instance as a unified management system for roles and rights. Results By the end of 2022, the TTP has already supported more than 10 research projects since its launch in December 2019. Within these projects, more than 3000 identities were stored, more than 30,000 pseudonyms were generated, and more than 1500 consent forms were submitted. In total, more than 150 people regularly work with the software platform. By implementing the integration layer and the unified user interface, together with comprehensive roles and rights management, the effort for operating the TTP could be significantly reduced, as personnel of the supported research projects can use many functionalities independently. Conclusions With the architecture and components described, we created a user-friendly and compliant environment for supporting research projects. We believe that the insights into the design and implementation of our TTP can help other institutions to efficiently and effectively set up corresponding structures.
Collapse
Affiliation(s)
- Eric Wündisch
- Core Unit THS, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Peter Hufnagl
- Digital Pathology, Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Peter Brunecker
- Core Unit Research IT, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sophie Meier zu Ummeln
- Core Unit THS, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sarah Träger
- Core Unit THS, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Marcus Kopp
- Core Unit THS, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Fabian Prasser
- Medical Informatics Group, Center of Health Data Science, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Joachim Weber
- Core Unit THS, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
- Center for Stroke Research Berlin, Charité – Universitätsmedizin Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Berlin, Germany
| |
Collapse
|
5
|
Karch A, Schindler D, Kühn-Steven A, Blaser R, Kuhn KA, Sandmann L, Sommerer C, Guba M, Heemann U, Strohäker J, Glöckner S, Mikolajczyk R, Busch DH, Schulz TF. The transplant cohort of the German center for infection research (DZIF Tx-Cohort): study design and baseline characteristics. Eur J Epidemiol 2021; 36:233-241. [PMID: 33492549 PMCID: PMC7987595 DOI: 10.1007/s10654-020-00715-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/19/2020] [Indexed: 01/14/2023]
Abstract
Infectious complications are the major cause of morbidity and mortality after solid organ and stem cell transplantation. To better understand host and environmental factors associated with an increased risk of infection as well as the effect of infections on function and survival of transplanted organs, we established the DZIF Transplant Cohort, a multicentre prospective cohort study within the organizational structure of the German Center for Infection Research. At time of transplantation, heart-, kidney-, lung-, liver-, pancreas- and hematopoetic stem cell- transplanted patients are enrolled into the study. Follow-up visits are scheduled at 3, 6, 9, 12 months after transplantation, and annually thereafter; extracurricular visits are conducted in case of infectious complications. Comprehensive standard operating procedures, web-based data collection and monitoring tools as well as a state of the art biobanking concept for blood, purified PBMCs, urine, and faeces samples ensure high quality of data and biosample collection. By collecting detailed information on immunosuppressive medication, infectious complications, type of infectious agent and therapy, as well as by providing corresponding biosamples, the cohort will establish the foundation for a broad spectrum of studies in the field of infectious diseases and transplant medicine. By January 2020, baseline data and biosamples of about 1400 patients have been collected. We plan to recruit 3500 patients by 2023, and continue follow-up visits and the documentation of infectious events at least until 2025. Information about the DZIF Transplant Cohort is available at https://www.dzif.de/en/working-group/transplant-cohort.
Collapse
Affiliation(s)
- André Karch
- Institute of Epidemiology and Social Medicine, University of Münster, Münster, Germany. .,German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.
| | - Daniela Schindler
- Department of Nephrology, Klinikum rechts der Isar of the Technical University Munich, Munich, Germany.,German Center for Infection Research, Munich Site, Munich, Germany
| | - Andrea Kühn-Steven
- German Center for Infection Research, Munich Site, Munich, Germany.,German Research Center for Environmental Health, Helmholtz Zentrum München, Munich, Germany
| | - Rainer Blaser
- German Center for Infection Research, Munich Site, Munich, Germany.,Institute of Medical Informatics, Statistics and Epidemiology, Technical University Munich, Munich, Germany
| | - Klaus A Kuhn
- German Center for Infection Research, Munich Site, Munich, Germany.,Institute of Medical Informatics, Statistics and Epidemiology, Technical University Munich, Munich, Germany
| | - Lisa Sandmann
- German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.,Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School (MHH), Hannover, Germany
| | - Claudia Sommerer
- German Center for Infection Research, Heidelberg Site, Heidelberg, Germany.,Nierenzentrum Heidelberg, Heidelberg, Germany
| | - Markus Guba
- German Center for Infection Research, Munich Site, Munich, Germany.,Department of General, Visceral and Transplantation Surgery, University Hospital, LMU Munich, Munich, Germany
| | - Uwe Heemann
- Department of Nephrology, Klinikum rechts der Isar of the Technical University Munich, Munich, Germany.,German Center for Infection Research, Munich Site, Munich, Germany
| | - Jens Strohäker
- German Center for Infection Research, Tübingen Site, Tübingen, Germany.,University Hospital for General, Visceral and Transplant Surgery, Tübingen, Germany
| | - Stephan Glöckner
- German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.,Epidemiology, Helmholtz Center for Infection Research Braunschweig, Brunswick, Germany
| | - Rafael Mikolajczyk
- Institute for Medical Epidemiology, Biometry and Informatics, Medical Faculty, Martin-Luther University Halle-Wittenberg, Halle, Germany
| | - Dirk H Busch
- German Center for Infection Research, Munich Site, Munich, Germany.,Institute for Medical Microbiology, Immunology and Hygiene (MIH), Technical University of Munich, Munich, Germany
| | - Thomas F Schulz
- German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.,Institute of Virology, Hannover Medical School (MHH), Hannover, Germany
| | | |
Collapse
|
6
|
Pung J, Rienhoff O. Key components and IT assistance of participant management in clinical research: a scoping review. JAMIA Open 2020; 3:449-458. [PMID: 33215078 PMCID: PMC7660951 DOI: 10.1093/jamiaopen/ooaa041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 07/16/2020] [Accepted: 08/24/2020] [Indexed: 01/05/2023] Open
Abstract
OBJECTIVES Managing participants and their data are fundamental for the success of a clinical trial. Our review identifies and describes processes that deal with management of trial participants and highlights information technology (IT) assistance for clinical research in the context of participant management. METHODS A scoping literature review design, based on the Preferred Reporting Items for Systematic Reviews and Meta-analyses statement, was used to identify literature on trial participant-related proceedings, work procedures, or workflows, and assisting electronic systems. RESULTS The literature search identified 1329 articles of which 111 were included for analysis. Participant-related procedures were categorized into 4 major trial processes: recruitment, obtaining informed consent, managing identities, and managing administrative data. Our results demonstrated that management of trial participants is considered in nearly every step of clinical trials, and that IT was successfully introduced to all participant-related areas of a clinical trial to facilitate processes. DISCUSSION There is no precise definition of participant management, so a broad search strategy was necessary, resulting in a high number of articles that had to be excluded. Nevertheless, this review provides a comprehensive overview of participant management-related components, which was lacking so far. The review contributes to a better understanding of how computer-assisted management of participants in clinical trials is possible.
Collapse
Affiliation(s)
- Johannes Pung
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| | - Otto Rienhoff
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
7
|
Hoffmann K, Cazemier K, Baldow C, Schuster S, Kheifetz Y, Schirm S, Horn M, Ernst T, Volgmann C, Thiede C, Hochhaus A, Bornhäuser M, Suttorp M, Scholz M, Glauche I, Loeffler M, Roeder I. Integration of mathematical model predictions into routine workflows to support clinical decision making in haematology. BMC Med Inform Decis Mak 2020; 20:28. [PMID: 32041606 PMCID: PMC7011438 DOI: 10.1186/s12911-020-1039-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 01/29/2020] [Indexed: 02/05/2023] Open
Abstract
Background Individualization and patient-specific optimization of treatment is a major goal of modern health care. One way to achieve this goal is the application of high-resolution diagnostics together with the application of targeted therapies. However, the rising number of different treatment modalities also induces new challenges: Whereas randomized clinical trials focus on proving average treatment effects in specific groups of patients, direct conclusions at the individual patient level are problematic. Thus, the identification of the best patient-specific treatment options remains an open question. Systems medicine, specifically mechanistic mathematical models, can substantially support individual treatment optimization. In addition to providing a better general understanding of disease mechanisms and treatment effects, these models allow for an identification of patient-specific parameterizations and, therefore, provide individualized predictions for the effect of different treatment modalities. Results In the following we describe a software framework that facilitates the integration of mathematical models and computer simulations into routine clinical processes to support decision-making. This is achieved by combining standard data management and data exploration tools, with the generation and visualization of mathematical model predictions for treatment options at an individual patient level. Conclusions By integrating model results in an audit trail compatible manner into established clinical workflows, our framework has the potential to foster the use of systems-medical approaches in clinical practice. We illustrate the framework application by two use cases from the field of haematological oncology.
Collapse
Affiliation(s)
- Katja Hoffmann
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Katja Cazemier
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Christoph Baldow
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Silvio Schuster
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Yuri Kheifetz
- Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
| | - Sibylle Schirm
- Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
| | - Matthias Horn
- Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
| | - Thomas Ernst
- Abteilung Hämatologie/Onkologie, Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
| | - Constanze Volgmann
- Abteilung Hämatologie/Onkologie, Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
| | - Christian Thiede
- Department of Internal Medicine, Medical Clinic I, University Hospital Carl Gustav Carus Dresden, Dresden, Germany
| | - Andreas Hochhaus
- Abteilung Hämatologie/Onkologie, Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
| | - Martin Bornhäuser
- Department of Internal Medicine, Medical Clinic I, University Hospital Carl Gustav Carus Dresden, Dresden, Germany.,National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany
| | - Meinolf Suttorp
- Pediatric Hematology and Oncology, Department of Pediatrics, University Hospital Carl Gustav Carus Dresden, Dresden, Germany
| | - Markus Scholz
- Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
| | - Ingmar Glauche
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
| | - Markus Loeffler
- Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
| | - Ingo Roeder
- Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany. .,National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany.
| |
Collapse
|
8
|
Kohlmayer F, Lautenschläger R, Prasser F. Pseudonymization for research data collection: is the juice worth the squeeze? BMC Med Inform Decis Mak 2019; 19:178. [PMID: 31484555 PMCID: PMC6727563 DOI: 10.1186/s12911-019-0905-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/29/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The collection of data and biospecimens which characterize patients and probands in-depth is a core element of modern biomedical research. Relevant data must be considered highly sensitive and it needs to be protected from unauthorized use and re-identification. In this context, laws, regulations, guidelines and best-practices often recommend or mandate pseudonymization, which means that directly identifying data of subjects (e.g. names and addresses) is stored separately from data which is primarily needed for scientific analyses. DISCUSSION When (authorized) re-identification of subjects is not an exceptional but a common procedure, e.g. due to longitudinal data collection, implementing pseudonymization can significantly increase the complexity of software solutions. For example, data stored in distributed databases, need to be dynamically combined with each other, which requires additional interfaces for communicating between the various subsystems. This increased complexity may lead to new attack vectors for intruders. Obviously, this is in contrast to the objective of improving data protection. What is lacking is a standardized process of evaluating and reporting risks, threats and countermeasures, which can be used to test whether integrating pseudonymization methods into data collection systems actually improves upon the degree of protection provided by system designs that simply follow common IT security best practices and implement fine-grained role-based access control models. To demonstrate that the methods used to describe systems employing pseudonymized data management are currently heterogeneous and ad-hoc, we examined the extent to which twelve recent studies address each of the six basic security properties defined by the International Organization for Standardization (ISO) standard 27,000. We show inconsistencies across the studies, with most of them failing to mention one or more security properties. CONCLUSION We discuss the degree of privacy protection provided by implementing pseudonymization into research data collection processes. We conclude that (1) more research is needed on the interplay of pseudonymity, information security and data protection, (2) problem-specific guidelines for evaluating and reporting risks, threats and countermeasures should be developed and that (3) future work on pseudonymized research data collection should include the results of such structured and integrated analyses.
Collapse
Affiliation(s)
- Florian Kohlmayer
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
| | - Ronald Lautenschläger
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
| | - Fabian Prasser
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.
| |
Collapse
|
9
|
Abstract
INTRODUCTION This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers. OBJECTIVES The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing. GOVERNANCE AND POLICIES Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change. ARCHITECTURAL FRAMEWORK AND METHODOLOGY The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing. USE CASES From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios. DISCUSSION Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.
Collapse
Affiliation(s)
- Fabian Prasser
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
- Correspondence to: Dr. Fabian Prasser Institute of Medical InformaticsStatistics and EpidemiologyUniversity Hospital rechts der IsarTechnical University of MunichIsmaninger Straße 2281675 MunichGermany
| | - Oliver Kohlbacher
- Department of Computer Science, Center for Bioinformatics and Quantitative Biology Center, Eberhard-Karls-Universität Tübingen, Tübingen, Germany
- Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Ulrich Mansmann
- Institute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Bernhard Bauer
- Department of Computer Science, University of Augsburg, Augsburg, Germany
| | - Klaus A. Kuhn
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
| |
Collapse
|
10
|
Johnson SB. Clinical Research Informatics: Supporting the Research Study Lifecycle. Yearb Med Inform 2017; 26:193-200. [PMID: 29063565 PMCID: PMC6239240 DOI: 10.15265/iy-2017-022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Indexed: 12/27/2022] Open
Abstract
Objectives: The primary goal of this review is to summarize significant developments in the field of Clinical Research Informatics (CRI) over the years 2015-2016. The secondary goal is to contribute to a deeper understanding of CRI as a field, through the development of a strategy for searching and classifying CRI publications. Methods: A search strategy was developed to query the PubMed database, using medical subject headings to both select and exclude articles, and filtering publications by date and other characteristics. A manual review classified publications using stages in the "research study lifecycle", with key stages that include study definition, participant enrollment, data management, data analysis, and results dissemination. Results: The search strategy generated 510 publications. The manual classification identified 125 publications as relevant to CRI, which were classified into seven different stages of the research lifecycle, and one additional class that pertained to multiple stages, referring to general infrastructure or standards. Important cross-cutting themes included new applications of electronic media (Internet, social media, mobile devices), standardization of data and procedures, and increased automation through the use of data mining and big data methods. Conclusions: The review revealed increased interest and support for CRI in large-scale projects across institutions, regionally, nationally, and internationally. A search strategy based on medical subject headings can find many relevant papers, but a large number of non-relevant papers need to be detected using text words which pertain to closely related fields such as computational statistics and clinical informatics. The research lifecycle was useful as a classification scheme by highlighting the relevance to the users of clinical research informatics solutions.
Collapse
Affiliation(s)
- S. B. Johnson
- Healthcare Policy and Research, Weill Cornell Medicine, New York, USA
| |
Collapse
|
11
|
Prasser F, Kohlmayer F, Kuhn KA. Efficient and effective pruning strategies for health data de-identification. BMC Med Inform Decis Mak 2016; 16:49. [PMID: 27130179 PMCID: PMC4851781 DOI: 10.1186/s12911-016-0287-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Accepted: 04/21/2016] [Indexed: 11/10/2022] Open
Abstract
Background Privacy must be protected when sensitive biomedical data is shared, e.g. for research purposes. Data de-identification is an important safeguard, where datasets are transformed to meet two conflicting objectives: minimizing re-identification risks while maximizing data quality. Typically, de-identification methods search a solution space of possible data transformations to find a good solution to a given de-identification problem. In this process, parts of the search space must be excluded to maintain scalability. Objectives The set of transformations which are solution candidates is typically narrowed down by storing the results obtained during the search process and then using them to predict properties of the output of other transformations in terms of privacy (first objective) and data quality (second objective). However, due to the exponential growth of the size of the search space, previous implementations of this method are not well-suited when datasets contain many attributes which need to be protected. As this is often the case with biomedical research data, e.g. as a result of longitudinal collection, we have developed a novel method. Methods Our approach combines the mathematical concept of antichains with a data structure inspired by prefix trees to represent properties of a large number of data transformations while requiring only a minimal amount of information to be stored. To analyze the improvements which can be achieved by adopting our method, we have integrated it into an existing algorithm and we have also implemented a simple best-first branch and bound search (BFS) algorithm as a first step towards methods which fully exploit our approach. We have evaluated these implementations with several real-world datasets and the k-anonymity privacy model. Results When integrated into existing de-identification algorithms for low-dimensional data, our approach reduced memory requirements by up to one order of magnitude and execution times by up to 25 %. This allowed us to increase the size of solution spaces which could be processed by almost a factor of 10. When using the simple BFS method, we were able to further increase the size of the solution space by a factor of three. When used as a heuristic strategy for high-dimensional data, the BFS approach outperformed a state-of-the-art algorithm by up to 12 % in terms of the quality of output data. Conclusions This work shows that implementing methods of data de-identification for real-world applications is a challenging task. Our approach solves a problem often faced by data custodians: a lack of scalability of de-identification software when used with datasets having realistic schemas and volumes. The method described in this article has been implemented into ARX, an open source de-identification software for biomedical data.
Collapse
Affiliation(s)
- Fabian Prasser
- Chair of Biomedical Informatics, Department of Medicine, Technical University of Munich (TUM), Munich, 81675, Germany.
| | - Florian Kohlmayer
- Chair of Biomedical Informatics, Department of Medicine, Technical University of Munich (TUM), Munich, 81675, Germany
| | - Klaus A Kuhn
- Chair of Biomedical Informatics, Department of Medicine, Technical University of Munich (TUM), Munich, 81675, Germany
| |
Collapse
|