Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Lautenschläger R, Kohlmayer F, Prasser F, Kuhn KA. A generic solution for web-based management of pseudonymized data. BMC Med Inform Decis Mak 2015;15:100. [PMID: 26621059 PMCID: PMC4665916 DOI: 10.1186/s12911-015-0222-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 11/25/2015] [Indexed: 12/02/2022] Open

For:	Lautenschläger R, Kohlmayer F, Prasser F, Kuhn KA. A generic solution for web-based management of pseudonymized data. BMC Med Inform Decis Mak 2015;15:100. [PMID: 26621059 PMCID: PMC4665916 DOI: 10.1186/s12911-015-0222-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 11/25/2015] [Indexed: 12/02/2022] Open

Number

Cited by Other Article(s)

Abu Attieh H, Müller A, Wirth FN, Prasser F. Pseudonymization tools for medical research: a systematic review. BMC Med Inform Decis Mak 2025;25:128. [PMID: 40075358 PMCID: PMC11905493 DOI: 10.1186/s12911-025-02958-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Accepted: 03/03/2025] [Indexed: 03/14/2025] Open

Abstract

BACKGROUND

Pseudonymization is an important technique for the secure and compliant use of medical data in research. At its core, pseudonymization is a process in which directly identifying information is separated from medical research data. Due to its importance, a wide range of pseudonymization tools and services have been developed, and researchers face the challenge of selecting an appropriate tool for their research projects. This review aims to address this challenge by systematically comparing existing tools.

METHODS

A systematic review was performed and is reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines where applicable. The search covered PubMed and Web of Science to identify pseudonymization tools documented in the scientific literature. The tools were assessed based on predefined criteria across four key dimensions that describe researchers' requirements: (1) single-center vs. multi-center use, (2) short-term vs. long-term projects, (3) small data vs. big data processing, and (4) integration vs. standalone functionality.

RESULTS

From an initial pool of 1,052 papers, 92 were selected for detailed full-text review after the title and abstract screening. This led to the identification of 20 pseudonymization tools, of which 10 met our inclusion criteria and were assessed. The results show that there are differences between the tools that make them more or less suited for research projects differing in regards to the dimensions described above, enabling us to provide targeted recommendations.

CONCLUSIONS

The landscape of existing pseudonymization tools is heterogeneous, and researchers need to carefully select the appropriate solutions for their research projects. Our findings highlight two Software-as-a-Service-based solutions that enable centralized use without local infrastructure, one tool for retrospective pseudonymization of existing databases, two tools suitable for local deployment in smaller, short-term projects, and two tools well-suited for local deployment in large, multi-center studies.

Collapse

Tarride JE, Okoh A, Aryal K, Prada C, Milinkovic D, Keepanasseril A, Iorio A. Scoping review of the recommendations and guidance for improving the quality of rare disease registries. Orphanet J Rare Dis 2024;19:187. [PMID: 38711103 PMCID: PMC11075280 DOI: 10.1186/s13023-024-03193-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Accepted: 04/19/2024] [Indexed: 05/08/2024] Open

Abstract

BACKGROUND

Rare disease registries (RDRs) are valuable tools for improving clinical care and advancing research. However, they often vary qualitatively, structurally, and operationally in ways that can determine their potential utility as a source of evidence to support decision-making regarding the approval and funding of new treatments for rare diseases.

OBJECTIVES

The goal of this research project was to review the literature on rare disease registries and identify best practices to improve the quality of RDRs.

METHODS

In this scoping review, we searched MEDLINE and EMBASE as well as the websites of regulatory bodies and health technology assessment agencies from 2010 to April 2023 for literature offering guidance or recommendations to ensure, improve, or maintain quality RDRs.

RESULTS

The search yielded 1,175 unique references, of which 64 met the inclusion criteria. The characteristics of RDRs deemed to be relevant to their quality align with three main domains and several sub-domains considered to be best practices for quality RDRs: (1) governance (registry purpose and description; governance structure; stakeholder engagement; sustainability; ethics/legal/privacy; data governance; documentation; and training and support); (2) data (standardized disease classification; common data elements; data dictionary; data collection; data quality and assurance; and data analysis and reporting); and (3) information technology (IT) infrastructure (physical and virtual infrastructure; and software infrastructure guided by FAIR principles (Findability; Accessibility; Interoperability; and Reusability).

CONCLUSIONS

Although RDRs face numerous challenges due to their small and dispersed populations, RDRs can generate quality data to support healthcare decision-making through the use of standards and principles on strong governance, quality data practices, and IT infrastructure.

Collapse

Abu Attieh H, Neves DT, Guedes M, Mirandola M, Dellacasa C, Rossi E, Prasser F. A Scalable Pseudonymization Tool for Rapid Deployment in Large Biomedical Research Networks: Development and Evaluation Study. JMIR Med Inform 2024;12:e49646. [PMID: 38654577 PMCID: PMC11063579 DOI: 10.2196/49646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/03/2023] [Accepted: 03/07/2024] [Indexed: 04/26/2024] Open

Abstract

Background

The SARS-CoV-2 pandemic has demonstrated once again that rapid collaborative research is essential for the future of biomedicine. Large research networks are needed to collect, share, and reuse data and biosamples to generate collaborative evidence. However, setting up such networks is often complex and time-consuming, as common tools and policies are needed to ensure interoperability and the required flows of data and samples, especially for handling personal data and the associated data protection issues. In biomedical research, pseudonymization detaches directly identifying details from biomedical data and biosamples and connects them using secure identifiers, the so-called pseudonyms. This protects privacy by design but allows the necessary linkage and reidentification.

Objective

Although pseudonymization is used in almost every biomedical study, there are currently no pseudonymization tools that can be rapidly deployed across many institutions. Moreover, using centralized services is often not possible, for example, when data are reused and consent for this type of data processing is lacking. We present the ORCHESTRA Pseudonymization Tool (OPT), developed under the umbrella of the ORCHESTRA consortium, which faced exactly these challenges when it came to rapidly establishing a large-scale research network in the context of the rapid pandemic response in Europe.

Methods

To overcome challenges caused by the heterogeneity of IT infrastructures across institutions, the OPT was developed based on programmable runtime environments available at practically every institution: office suites. The software is highly configurable and provides many features, from subject and biosample registration to record linkage and the printing of machine-readable codes for labeling biosample tubes. Special care has been taken to ensure that the algorithms implemented are efficient so that the OPT can be used to pseudonymize large data sets, which we demonstrate through a comprehensive evaluation.

Results

The OPT is available for Microsoft Office and LibreOffice, so it can be deployed on Windows, Linux, and MacOS. It provides multiuser support and is configurable to meet the needs of different types of research projects. Within the ORCHESTRA research network, the OPT has been successfully deployed at 13 institutions in 11 countries in Europe and beyond. As of June 2023, the software manages data about more than 30,000 subjects and 15,000 biosamples. Over 10,000 labels have been printed. The results of our experimental evaluation show that the OPT offers practical response times for all major functionalities, pseudonymizing 100,000 subjects in 10 seconds using Microsoft Excel and in 54 seconds using LibreOffice.

Conclusions

Innovative solutions are needed to make the process of establishing large research networks more efficient. The OPT, which leverages the runtime environment of common office suites, can be used to rapidly deploy pseudonymization and biosample management capabilities across research networks. The tool is highly configurable and available as open-source software.

Collapse

Wündisch E, Hufnagl P, Brunecker P, Meier zu Ummeln S, Träger S, Kopp M, Prasser F, Weber J. Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study. JMIR Med Inform 2024;12:e53075. [PMID: 38632712 PMCID: PMC11040164 DOI: 10.2196/53075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/15/2024] [Accepted: 02/17/2024] [Indexed: 04/19/2024] Open

Abstract

Background

Pseudonymization has become a best practice to securely manage the identities of patients and study participants in medical research projects and data sharing initiatives. This method offers the advantage of not requiring the direct identification of data to support various research processes while still allowing for advanced processing activities, such as data linkage. Often, pseudonymization and related functionalities are bundled in specific technical and organization units known as trusted third parties (TTPs). However, pseudonymization can significantly increase the complexity of data management and research workflows, necessitating adequate tool support. Common tasks of TTPs include supporting the secure registration and pseudonymization of patient and sample identities as well as managing consent.

Objective

Despite the challenges involved, little has been published about successful architectures and functional tools for implementing TTPs in large university hospitals. The aim of this paper is to fill this research gap by describing the software architecture and tool set developed and deployed as part of a TTP established at Charité - Universitätsmedizin Berlin.

Methods

The infrastructure for the TTP was designed to provide a modular structure while keeping maintenance requirements low. Basic functionalities were realized with the free MOSAIC tools. However, supporting common study processes requires implementing workflows that span different basic services, such as patient registration, followed by pseudonym generation and concluded by consent collection. To achieve this, an integration layer was developed to provide a unified Representational state transfer (REST) application programming interface (API) as a basis for more complex workflows. Based on this API, a unified graphical user interface was also implemented, providing an integrated view of information objects and workflows supported by the TTP. The API was implemented using Java and Spring Boot, while the graphical user interface was implemented in PHP and Laravel. Both services use a shared Keycloak instance as a unified management system for roles and rights.

Results

By the end of 2022, the TTP has already supported more than 10 research projects since its launch in December 2019. Within these projects, more than 3000 identities were stored, more than 30,000 pseudonyms were generated, and more than 1500 consent forms were submitted. In total, more than 150 people regularly work with the software platform. By implementing the integration layer and the unified user interface, together with comprehensive roles and rights management, the effort for operating the TTP could be significantly reduced, as personnel of the supported research projects can use many functionalities independently.

Conclusions

With the architecture and components described, we created a user-friendly and compliant environment for supporting research projects. We believe that the insights into the design and implementation of our TTP can help other institutions to efficiently and effectively set up corresponding structures.

Collapse

Karch A, Schindler D, Kühn-Steven A, Blaser R, Kuhn KA, Sandmann L, Sommerer C, Guba M, Heemann U, Strohäker J, Glöckner S, Mikolajczyk R, Busch DH, Schulz TF. The transplant cohort of the German center for infection research (DZIF Tx-Cohort): study design and baseline characteristics. Eur J Epidemiol 2021;36:233-241. [PMID: 33492549 PMCID: PMC7987595 DOI: 10.1007/s10654-020-00715-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Accepted: 12/19/2020] [Indexed: 01/14/2023]

Affiliation(s)

André Karch Institute of Epidemiology and Social Medicine, University of Münster, Münster, Germany. .,German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.
Daniela Schindler Department of Nephrology, Klinikum rechts der Isar of the Technical University Munich, Munich, Germany.,German Center for Infection Research, Munich Site, Munich, Germany
Andrea Kühn-Steven German Center for Infection Research, Munich Site, Munich, Germany.,German Research Center for Environmental Health, Helmholtz Zentrum München, Munich, Germany
Rainer Blaser German Center for Infection Research, Munich Site, Munich, Germany.,Institute of Medical Informatics, Statistics and Epidemiology, Technical University Munich, Munich, Germany
Klaus A Kuhn German Center for Infection Research, Munich Site, Munich, Germany.,Institute of Medical Informatics, Statistics and Epidemiology, Technical University Munich, Munich, Germany
Lisa Sandmann German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.,Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School (MHH), Hannover, Germany
Claudia Sommerer German Center for Infection Research, Heidelberg Site, Heidelberg, Germany.,Nierenzentrum Heidelberg, Heidelberg, Germany
Markus Guba German Center for Infection Research, Munich Site, Munich, Germany.,Department of General, Visceral and Transplantation Surgery, University Hospital, LMU Munich, Munich, Germany
Uwe Heemann Department of Nephrology, Klinikum rechts der Isar of the Technical University Munich, Munich, Germany.,German Center for Infection Research, Munich Site, Munich, Germany
Jens Strohäker German Center for Infection Research, Tübingen Site, Tübingen, Germany.,University Hospital for General, Visceral and Transplant Surgery, Tübingen, Germany
Stephan Glöckner German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.,Epidemiology, Helmholtz Center for Infection Research Braunschweig, Brunswick, Germany
Rafael Mikolajczyk Institute for Medical Epidemiology, Biometry and Informatics, Medical Faculty, Martin-Luther University Halle-Wittenberg, Halle, Germany
Dirk H Busch German Center for Infection Research, Munich Site, Munich, Germany.,Institute for Medical Microbiology, Immunology and Hygiene (MIH), Technical University of Munich, Munich, Germany
Thomas F Schulz German Center for Infection Research, Hannover-Braunschweig Site, Brunswick, Germany.,Institute of Virology, Hannover Medical School (MHH), Hannover, Germany

Collapse

Pung J, Rienhoff O. Key components and IT assistance of participant management in clinical research: a scoping review. JAMIA Open 2020;3:449-458. [PMID: 33215078 PMCID: PMC7660951 DOI: 10.1093/jamiaopen/ooaa041] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 07/16/2020] [Accepted: 08/24/2020] [Indexed: 01/05/2023] Open

Hoffmann K, Cazemier K, Baldow C, Schuster S, Kheifetz Y, Schirm S, Horn M, Ernst T, Volgmann C, Thiede C, Hochhaus A, Bornhäuser M, Suttorp M, Scholz M, Glauche I, Loeffler M, Roeder I. Integration of mathematical model predictions into routine workflows to support clinical decision making in haematology. BMC Med Inform Decis Mak 2020;20:28. [PMID: 32041606 PMCID: PMC7011438 DOI: 10.1186/s12911-020-1039-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2018] [Accepted: 01/29/2020] [Indexed: 02/05/2023] Open

Affiliation(s)

Katja Hoffmann Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Katja Cazemier Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Christoph Baldow Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Silvio Schuster Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Yuri Kheifetz Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
Sibylle Schirm Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
Matthias Horn Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
Thomas Ernst Abteilung Hämatologie/Onkologie, Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
Constanze Volgmann Abteilung Hämatologie/Onkologie, Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
Christian Thiede Department of Internal Medicine, Medical Clinic I, University Hospital Carl Gustav Carus Dresden, Dresden, Germany
Andreas Hochhaus Abteilung Hämatologie/Onkologie, Klinik für Innere Medizin II, Universitätsklinikum Jena, Jena, Germany
Martin Bornhäuser Department of Internal Medicine, Medical Clinic I, University Hospital Carl Gustav Carus Dresden, Dresden, Germany.,National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany
Meinolf Suttorp Pediatric Hematology and Oncology, Department of Pediatrics, University Hospital Carl Gustav Carus Dresden, Dresden, Germany
Markus Scholz Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
Ingmar Glauche Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany
Markus Loeffler Institute for Medical Informatics, Statistics and Epidemiology, Faculty of Medicine, University of Leipzig, Leipzig, Germany
Ingo Roeder Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Dresden, Germany. .,National Center for Tumor Diseases (NCT), Partner Site Dresden, Dresden, Germany.

Collapse

Kohlmayer F, Lautenschläger R, Prasser F. Pseudonymization for research data collection: is the juice worth the squeeze? BMC Med Inform Decis Mak 2019;19:178. [PMID: 31484555 PMCID: PMC6727563 DOI: 10.1186/s12911-019-0905-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/29/2019] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

The collection of data and biospecimens which characterize patients and probands in-depth is a core element of modern biomedical research. Relevant data must be considered highly sensitive and it needs to be protected from unauthorized use and re-identification. In this context, laws, regulations, guidelines and best-practices often recommend or mandate pseudonymization, which means that directly identifying data of subjects (e.g. names and addresses) is stored separately from data which is primarily needed for scientific analyses.

DISCUSSION

When (authorized) re-identification of subjects is not an exceptional but a common procedure, e.g. due to longitudinal data collection, implementing pseudonymization can significantly increase the complexity of software solutions. For example, data stored in distributed databases, need to be dynamically combined with each other, which requires additional interfaces for communicating between the various subsystems. This increased complexity may lead to new attack vectors for intruders. Obviously, this is in contrast to the objective of improving data protection. What is lacking is a standardized process of evaluating and reporting risks, threats and countermeasures, which can be used to test whether integrating pseudonymization methods into data collection systems actually improves upon the degree of protection provided by system designs that simply follow common IT security best practices and implement fine-grained role-based access control models. To demonstrate that the methods used to describe systems employing pseudonymized data management are currently heterogeneous and ad-hoc, we examined the extent to which twelve recent studies address each of the six basic security properties defined by the International Organization for Standardization (ISO) standard 27,000. We show inconsistencies across the studies, with most of them failing to mention one or more security properties.

CONCLUSION

We discuss the degree of privacy protection provided by implementing pseudonymization into research data collection processes. We conclude that (1) more research is needed on the interplay of pseudonymity, information security and data protection, (2) problem-specific guidelines for evaluating and reporting risks, threats and countermeasures should be developed and that (3) future work on pseudonymized research data collection should include the results of such structured and integrated analyses.

Collapse

Prasser F, Kohlbacher O, Mansmann U, Bauer B, Kuhn KA. Data Integration for Future Medicine (DIFUTURE). Methods Inf Med 2018;57:e57-e65. [PMID: 30016812 PMCID: PMC6178202 DOI: 10.3414/me17-02-0022] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]

Abstract

INTRODUCTION

This article is part of the Focus Theme of Methods of Information in Medicine on the German Medical Informatics Initiative. Future medicine will be predictive, preventive, personalized, participatory and digital. Data and knowledge at comprehensive depth and breadth need to be available for research and at the point of care as a basis for targeted diagnosis and therapy. Data integration and data sharing will be essential to achieve these goals. For this purpose, the consortium Data Integration for Future Medicine (DIFUTURE) will establish Data Integration Centers (DICs) at university medical centers.

OBJECTIVES

The infrastructure envisioned by DIFUTURE will provide researchers with cross-site access to data and support physicians by innovative views on integrated data as well as by decision support components for personalized treatments. The aim of our use cases is to show that this accelerates innovation, improves health care processes and results in tangible benefits for our patients. To realize our vision, numerous challenges have to be addressed. The objective of this article is to describe our concepts and solutions on the technical and the organizational level with a specific focus on data integration and sharing.

GOVERNANCE AND POLICIES

Data sharing implies significant security and privacy challenges. Therefore, state-of-the-art data protection, modern IT security concepts and patient trust play a central role in our approach. We have established governance structures and policies safeguarding data use and sharing by technical and organizational measures providing highest levels of data protection. One of our central policies is that adequate methods of data sharing for each use case and project will be selected based on rigorous risk and threat analyses. Interdisciplinary groups have been installed in order to manage change.

ARCHITECTURAL FRAMEWORK AND METHODOLOGY

The DIFUTURE Data Integration Centers will implement a three-step approach to integrating, harmonizing and sharing structured, unstructured and omics data as well as images from clinical and research environments. First, data is imported and technically harmonized using common data and interface standards (including various IHE profiles, DICOM and HL7 FHIR). Second, data is preprocessed, transformed, harmonized and enriched within a staging and working environment. Third, data is imported into common analytics platforms and data models (including i2b2 and tranSMART) and made accessible in a form compliant with the interoperability requirements defined on the national level. Secure data access and sharing will be implemented with innovative combinations of privacy-enhancing technologies (safe data, safe settings, safe outputs) and methods of distributed computing.

USE CASES

From the perspective of health care and medical research, our approach is disease-oriented and use-case driven, i.e. following the needs of physicians and researchers and aiming at measurable benefits for our patients. We will work on early diagnosis, tailored therapies and therapy decision tools with focuses on neurology, oncology and further disease entities. Our early uses cases will serve as blueprints for the following ones, verifying that the infrastructure developed by DIFUTURE is able to support a variety of application scenarios.

DISCUSSION

Own previous work, the use of internationally successful open source systems and a state-of-the-art software architecture are cornerstones of our approach. In the conceptual phase of the initiative, we have already prototypically implemented and tested the most important components of our architecture.

Collapse

Johnson SB. Clinical Research Informatics: Supporting the Research Study Lifecycle. Yearb Med Inform 2017;26:193-200. [PMID: 29063565 PMCID: PMC6239240 DOI: 10.15265/iy-2017-022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Indexed: 12/27/2022] Open

Abstract

Objectives: The primary goal of this review is to summarize significant developments in the field of Clinical Research Informatics (CRI) over the years 2015-2016. The secondary goal is to contribute to a deeper understanding of CRI as a field, through the development of a strategy for searching and classifying CRI publications. Methods: A search strategy was developed to query the PubMed database, using medical subject headings to both select and exclude articles, and filtering publications by date and other characteristics. A manual review classified publications using stages in the "research study lifecycle", with key stages that include study definition, participant enrollment, data management, data analysis, and results dissemination. Results: The search strategy generated 510 publications. The manual classification identified 125 publications as relevant to CRI, which were classified into seven different stages of the research lifecycle, and one additional class that pertained to multiple stages, referring to general infrastructure or standards. Important cross-cutting themes included new applications of electronic media (Internet, social media, mobile devices), standardization of data and procedures, and increased automation through the use of data mining and big data methods. Conclusions: The review revealed increased interest and support for CRI in large-scale projects across institutions, regionally, nationally, and internationally. A search strategy based on medical subject headings can find many relevant papers, but a large number of non-relevant papers need to be detected using text words which pertain to closely related fields such as computational statistics and clinical informatics. The research lifecycle was useful as a classification scheme by highlighting the relevance to the users of clinical research informatics solutions.

Collapse

Prasser F, Kohlmayer F, Kuhn KA. Efficient and effective pruning strategies for health data de-identification. BMC Med Inform Decis Mak 2016;16:49. [PMID: 27130179 PMCID: PMC4851781 DOI: 10.1186/s12911-016-0287-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Accepted: 04/21/2016] [Indexed: 11/10/2022] Open

Abstract

Background

Privacy must be protected when sensitive biomedical data is shared, e.g. for research purposes. Data de-identification is an important safeguard, where datasets are transformed to meet two conflicting objectives: minimizing re-identification risks while maximizing data quality. Typically, de-identification methods search a solution space of possible data transformations to find a good solution to a given de-identification problem. In this process, parts of the search space must be excluded to maintain scalability.

Objectives

The set of transformations which are solution candidates is typically narrowed down by storing the results obtained during the search process and then using them to predict properties of the output of other transformations in terms of privacy (first objective) and data quality (second objective). However, due to the exponential growth of the size of the search space, previous implementations of this method are not well-suited when datasets contain many attributes which need to be protected. As this is often the case with biomedical research data, e.g. as a result of longitudinal collection, we have developed a novel method.

Methods

Our approach combines the mathematical concept of antichains with a data structure inspired by prefix trees to represent properties of a large number of data transformations while requiring only a minimal amount of information to be stored. To analyze the improvements which can be achieved by adopting our method, we have integrated it into an existing algorithm and we have also implemented a simple best-first branch and bound search (BFS) algorithm as a first step towards methods which fully exploit our approach. We have evaluated these implementations with several real-world datasets and the k-anonymity privacy model.

Results

When integrated into existing de-identification algorithms for low-dimensional data, our approach reduced memory requirements by up to one order of magnitude and execution times by up to 25 %. This allowed us to increase the size of solution spaces which could be processed by almost a factor of 10. When using the simple BFS method, we were able to further increase the size of the solution space by a factor of three. When used as a heuristic strategy for high-dimensional data, the BFS approach outperformed a state-of-the-art algorithm by up to 12 % in terms of the quality of output data.

Conclusions

This work shows that implementing methods of data de-identification for real-world applications is a challenging task. Our approach solves a problem often faced by data custodians: a lack of scalability of de-identification software when used with datasets having realistic schemas and volumes. The method described in this article has been implemented into ARX, an open source de-identification software for biomedical data.

Collapse