1
|
Abu Attieh H, Neves DT, Guedes M, Mirandola M, Dellacasa C, Rossi E, Prasser F. A Scalable Pseudonymization Tool for Rapid Deployment in Large Biomedical Research Networks: Development and Evaluation Study. JMIR Med Inform 2024; 12:e49646. [PMID: 38654577 PMCID: PMC11063579 DOI: 10.2196/49646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/03/2023] [Accepted: 03/07/2024] [Indexed: 04/26/2024] Open
Abstract
Background The SARS-CoV-2 pandemic has demonstrated once again that rapid collaborative research is essential for the future of biomedicine. Large research networks are needed to collect, share, and reuse data and biosamples to generate collaborative evidence. However, setting up such networks is often complex and time-consuming, as common tools and policies are needed to ensure interoperability and the required flows of data and samples, especially for handling personal data and the associated data protection issues. In biomedical research, pseudonymization detaches directly identifying details from biomedical data and biosamples and connects them using secure identifiers, the so-called pseudonyms. This protects privacy by design but allows the necessary linkage and reidentification. Objective Although pseudonymization is used in almost every biomedical study, there are currently no pseudonymization tools that can be rapidly deployed across many institutions. Moreover, using centralized services is often not possible, for example, when data are reused and consent for this type of data processing is lacking. We present the ORCHESTRA Pseudonymization Tool (OPT), developed under the umbrella of the ORCHESTRA consortium, which faced exactly these challenges when it came to rapidly establishing a large-scale research network in the context of the rapid pandemic response in Europe. Methods To overcome challenges caused by the heterogeneity of IT infrastructures across institutions, the OPT was developed based on programmable runtime environments available at practically every institution: office suites. The software is highly configurable and provides many features, from subject and biosample registration to record linkage and the printing of machine-readable codes for labeling biosample tubes. Special care has been taken to ensure that the algorithms implemented are efficient so that the OPT can be used to pseudonymize large data sets, which we demonstrate through a comprehensive evaluation. Results The OPT is available for Microsoft Office and LibreOffice, so it can be deployed on Windows, Linux, and MacOS. It provides multiuser support and is configurable to meet the needs of different types of research projects. Within the ORCHESTRA research network, the OPT has been successfully deployed at 13 institutions in 11 countries in Europe and beyond. As of June 2023, the software manages data about more than 30,000 subjects and 15,000 biosamples. Over 10,000 labels have been printed. The results of our experimental evaluation show that the OPT offers practical response times for all major functionalities, pseudonymizing 100,000 subjects in 10 seconds using Microsoft Excel and in 54 seconds using LibreOffice. Conclusions Innovative solutions are needed to make the process of establishing large research networks more efficient. The OPT, which leverages the runtime environment of common office suites, can be used to rapidly deploy pseudonymization and biosample management capabilities across research networks. The tool is highly configurable and available as open-source software.
Collapse
Affiliation(s)
- Hammam Abu Attieh
- Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Diogo Telmo Neves
- Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Mariana Guedes
- Infection and Antimicrobial Resistance Control and Prevention Unit, Centro Hospitalar Universitário São João, Porto, Portugal
- Infectious Diseases and Microbiology Division, Hospital Universitario Virgen Macarena, Sevilla, Spain
- Department of Medicine, University of Sevilla/Instituto de Biomedicina de Sevilla (IBiS)/Consejo Superior de Investigaciones Científicas (CSIC), Sevilla, Spain
| | - Massimo Mirandola
- Infectious Diseases Division, Diagnostic and Public Health Department, University of Verona, Verona, Italy
| | - Chiara Dellacasa
- High Performance Computing (HPC) Department, CINECA - Consorzio Interuniversitario, Bologna, Italy
| | - Elisa Rossi
- High Performance Computing (HPC) Department, CINECA - Consorzio Interuniversitario, Bologna, Italy
| | - Fabian Prasser
- Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
2
|
Wündisch E, Hufnagl P, Brunecker P, Meier Zu Ummeln S, Träger S, Kopp M, Prasser F, Weber J. Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study. JMIR Med Inform 2024; 12:e53075. [PMID: 38632712 DOI: 10.2196/53075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Revised: 02/15/2024] [Accepted: 02/17/2024] [Indexed: 04/19/2024] Open
Abstract
Background Pseudonymization has become a best practice to securely manage the identities of patients and study participants in medical research projects and data sharing initiatives. This method offers the advantage of not requiring the direct identification of data to support various research processes while still allowing for advanced processing activities, such as data linkage. Often, pseudonymization and related functionalities are bundled in specific technical and organization units known as trusted third parties (TTPs). However, pseudonymization can significantly increase the complexity of data management and research workflows, necessitating adequate tool support. Common tasks of TTPs include supporting the secure registration and pseudonymization of patient and sample identities as well as managing consent. Objective Despite the challenges involved, little has been published about successful architectures and functional tools for implementing TTPs in large university hospitals. The aim of this paper is to fill this research gap by describing the software architecture and tool set developed and deployed as part of a TTP established at Charité - Universitätsmedizin Berlin. Methods The infrastructure for the TTP was designed to provide a modular structure while keeping maintenance requirements low. Basic functionalities were realized with the free MOSAIC tools. However, supporting common study processes requires implementing workflows that span different basic services, such as patient registration, followed by pseudonym generation and concluded by consent collection. To achieve this, an integration layer was developed to provide a unified Representational state transfer (REST) application programming interface (API) as a basis for more complex workflows. Based on this API, a unified graphical user interface was also implemented, providing an integrated view of information objects and workflows supported by the TTP. The API was implemented using Java and Spring Boot, while the graphical user interface was implemented in PHP and Laravel. Both services use a shared Keycloak instance as a unified management system for roles and rights. Results By the end of 2022, the TTP has already supported more than 10 research projects since its launch in December 2019. Within these projects, more than 3000 identities were stored, more than 30,000 pseudonyms were generated, and more than 1500 consent forms were submitted. In total, more than 150 people regularly work with the software platform. By implementing the integration layer and the unified user interface, together with comprehensive roles and rights management, the effort for operating the TTP could be significantly reduced, as personnel of the supported research projects can use many functionalities independently. Conclusions With the architecture and components described, we created a user-friendly and compliant environment for supporting research projects. We believe that the insights into the design and implementation of our TTP can help other institutions to efficiently and effectively set up corresponding structures.
Collapse
Affiliation(s)
- Eric Wündisch
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Peter Hufnagl
- Digital Pathology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Peter Brunecker
- Core Unit Research IT, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sophie Meier Zu Ummeln
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Sarah Träger
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Marcus Kopp
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Fabian Prasser
- Medical Informatics Group, Center of Health Data Science, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Joachim Weber
- Core Unit THS, Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
- Center for Stroke Research Berlin, Charité - Universitätsmedizin Berlin, Berlin, Germany
- German Centre for Cardiovascular Research (DZHK), Berlin, Germany
| |
Collapse
|
3
|
Baum L, Johns M, Poikela M, Möller R, Ananthasubramaniam B, Prasser F. Data integration and analysis for circadian medicine. Acta Physiol (Oxf) 2023; 237:e13951. [PMID: 36790321 DOI: 10.1111/apha.13951] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 02/04/2023] [Accepted: 02/12/2023] [Indexed: 02/16/2023]
Abstract
Data integration, data sharing, and standardized analyses are important enablers for data-driven medical research. Circadian medicine is an emerging field with a particularly high need for coordinated and systematic collaboration between researchers from different disciplines. Datasets in circadian medicine are multimodal, ranging from molecular circadian profiles and clinical parameters to physiological measurements and data obtained from (wearable) sensors or reported by patients. Uniquely, data spanning both the time dimension and the spatial dimension (across tissues) are needed to obtain a holistic view of the circadian system. The study of human rhythms in the context of circadian medicine has to confront the heterogeneity of clock properties within and across subjects and our inability to repeatedly obtain relevant biosamples from one subject. This requires informatics solutions for integrating and visualizing relevant data types at various temporal resolutions ranging from milliseconds and seconds to minutes and several hours. Associated challenges range from a lack of standards that can be used to represent all required data in a common interoperable form, to challenges related to data storage, to the need to perform transformations for integrated visualizations, and to privacy issues. The downstream analysis of circadian rhythms requires specialized approaches for the identification, characterization, and discrimination of rhythms. We conclude that circadian medicine research provides an ideal environment for developing innovative methods to address challenges related to the collection, integration, visualization, and analysis of multimodal multidimensional biomedical data.
Collapse
Affiliation(s)
- Lena Baum
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Marco Johns
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Maija Poikela
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Ralf Möller
- Institute of Information Systems, University of Lübeck, Lübeck, Germany
| | | | - Fabian Prasser
- Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany
| |
Collapse
|
4
|
Crossfield SSR, Zucker K, Baxter P, Wright P, Fistein J, Markham AF, Birkin M, Glaser AW, Hall G. A data flow process for confidential data and its application in a health research project. PLoS One 2022; 17:e0262609. [PMID: 35061834 PMCID: PMC8782367 DOI: 10.1371/journal.pone.0262609] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 12/29/2021] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The use of linked healthcare data in research has the potential to make major contributions to knowledge generation and service improvement. However, using healthcare data for secondary purposes raises legal and ethical concerns relating to confidentiality, privacy and data protection rights. Using a linkage and anonymisation approach that processes data lawfully and in line with ethical best practice to create an anonymous (non-personal) dataset can address these concerns, yet there is no set approach for defining all of the steps involved in such data flow end-to-end. We aimed to define such an approach with clear steps for dataset creation, and to describe its utilisation in a case study linking healthcare data. METHODS We developed a data flow protocol that generates pseudonymous datasets that can be reversibly linked, or irreversibly linked to form an anonymous research dataset. It was designed and implemented by the Comprehensive Patient Records (CPR) study in Leeds, UK. RESULTS We defined a clear approach that received ethico-legal approval for use in creating an anonymous research dataset. Our approach used individual-level linkage through a mechanism that is not computer-intensive and was rendered irreversible to both data providers and processors. We successfully applied it in the CPR study to hospital and general practice and community electronic health record data from two providers, along with patient reported outcomes, for 365,193 patients. The resultant anonymous research dataset is available via DATA-CAN, the Health Data Research Hub for Cancer in the UK. CONCLUSIONS Through ethical, legal and academic review, we believe that we contribute a defined approach that represents a framework that exceeds current minimum standards for effective pseudonymisation and anonymisation. This paper describes our methods and provides supporting information to facilitate the use of this approach in research.
Collapse
Affiliation(s)
| | - Kieran Zucker
- Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, United Kingdom
| | - Paul Baxter
- Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, United Kingdom
| | - Penny Wright
- Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, United Kingdom
| | - Jon Fistein
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom
| | - Alex F. Markham
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom
- Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, United Kingdom
| | - Mark Birkin
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom
| | - Adam W. Glaser
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom
- Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, United Kingdom
| | - Geoff Hall
- Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom
- Leeds Institute of Medical Research at St James’s, University of Leeds, Leeds, United Kingdom
| |
Collapse
|
5
|
Abstract
Cryptography is traditionally considered as a main information security mechanism, providing several security services such as confidentiality, as well as data and entity authentication. This aspect is clearly relevant to the fundamental human right of privacy, in terms of securing data from eavesdropping and tampering, as well as from masquerading their origin. However, cryptography may also support several other (legal) requirements related to privacy. For example, in order to fulfil the data minimisation principle—i.e., to ensure that the personal data that are being processed are adequate and limited only to what is necessary in relation to the purposes for which they are processed—the use of advanced cryptographic techniques such as secure computations, zero-knowledge proofs or homomorphic encryption may be prerequisite. In practice though, it seems that the organisations performing personal data processing are not fully aware of such solutions, thus adopting techniques that pose risks for the rights of individuals. This paper aims to provide a generic overview of the possible cryptographic applications that suffice to address privacy challenges. In the process, we shall also state our view on the public “debate” on finding ways so as to allow law enforcement agencies to bypass the encryption of communication.
Collapse
|
6
|
Pedrosa M, Zuquete A, Costa C. A Pseudonymisation Protocol With Implicit and Explicit Consent Routes for Health Records in Federated Ledgers. IEEE J Biomed Health Inform 2021; 25:2172-2183. [PMID: 33006933 DOI: 10.1109/jbhi.2020.3028454] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Healthcare data for primary use (diagnosis) may be encrypted for confidentiality purposes; however, secondary uses such as feeding machine learning algorithms requires open access. Full anonymity has no traceable identifiers to report diagnosis results. Moreover, implicit and explicit consent routes are of practical importance under recent data protection regulations (GDPR), translating directly into break-the-glass requirements. Pseudonymisation is an acceptable compromise when dealing with such orthogonal requirements and is an advisable measure to protect data. Our work presents a pseudonymisation protocol that is compliant with implicit and explicit consent routes. The protocol is constructed on a (t,n)-threshold secret sharing scheme and public key cryptography. The pseudonym is safely derived from a fragment of public information without requiring any data-subject's secret. The method is proven secure under reasonable cryptographic assumptions and scalable from the experimental results.
Collapse
|
7
|
Pung J, Rienhoff O. Key components and IT assistance of participant management in clinical research: a scoping review. JAMIA Open 2020; 3:449-458. [PMID: 33215078 PMCID: PMC7660951 DOI: 10.1093/jamiaopen/ooaa041] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 07/16/2020] [Accepted: 08/24/2020] [Indexed: 01/05/2023] Open
Abstract
Objectives Managing participants and their data are fundamental for the success of a clinical trial. Our review identifies and describes processes that deal with management of trial participants and highlights information technology (IT) assistance for clinical research in the context of participant management. Methods A scoping literature review design, based on the Preferred Reporting Items for Systematic Reviews and Meta-analyses statement, was used to identify literature on trial participant-related proceedings, work procedures, or workflows, and assisting electronic systems. Results The literature search identified 1329 articles of which 111 were included for analysis. Participant-related procedures were categorized into 4 major trial processes: recruitment, obtaining informed consent, managing identities, and managing administrative data. Our results demonstrated that management of trial participants is considered in nearly every step of clinical trials, and that IT was successfully introduced to all participant-related areas of a clinical trial to facilitate processes. Discussion There is no precise definition of participant management, so a broad search strategy was necessary, resulting in a high number of articles that had to be excluded. Nevertheless, this review provides a comprehensive overview of participant management-related components, which was lacking so far. The review contributes to a better understanding of how computer-assisted management of participants in clinical trials is possible.
Collapse
Affiliation(s)
- Johannes Pung
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| | - Otto Rienhoff
- Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany
| |
Collapse
|
8
|
Kohlmayer F, Lautenschläger R, Prasser F. Pseudonymization for research data collection: is the juice worth the squeeze? BMC Med Inform Decis Mak 2019; 19:178. [PMID: 31484555 PMCID: PMC6727563 DOI: 10.1186/s12911-019-0905-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2018] [Accepted: 08/29/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The collection of data and biospecimens which characterize patients and probands in-depth is a core element of modern biomedical research. Relevant data must be considered highly sensitive and it needs to be protected from unauthorized use and re-identification. In this context, laws, regulations, guidelines and best-practices often recommend or mandate pseudonymization, which means that directly identifying data of subjects (e.g. names and addresses) is stored separately from data which is primarily needed for scientific analyses. DISCUSSION When (authorized) re-identification of subjects is not an exceptional but a common procedure, e.g. due to longitudinal data collection, implementing pseudonymization can significantly increase the complexity of software solutions. For example, data stored in distributed databases, need to be dynamically combined with each other, which requires additional interfaces for communicating between the various subsystems. This increased complexity may lead to new attack vectors for intruders. Obviously, this is in contrast to the objective of improving data protection. What is lacking is a standardized process of evaluating and reporting risks, threats and countermeasures, which can be used to test whether integrating pseudonymization methods into data collection systems actually improves upon the degree of protection provided by system designs that simply follow common IT security best practices and implement fine-grained role-based access control models. To demonstrate that the methods used to describe systems employing pseudonymized data management are currently heterogeneous and ad-hoc, we examined the extent to which twelve recent studies address each of the six basic security properties defined by the International Organization for Standardization (ISO) standard 27,000. We show inconsistencies across the studies, with most of them failing to mention one or more security properties. CONCLUSION We discuss the degree of privacy protection provided by implementing pseudonymization into research data collection processes. We conclude that (1) more research is needed on the interplay of pseudonymity, information security and data protection, (2) problem-specific guidelines for evaluating and reporting risks, threats and countermeasures should be developed and that (3) future work on pseudonymized research data collection should include the results of such structured and integrated analyses.
Collapse
Affiliation(s)
- Florian Kohlmayer
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
| | - Ronald Lautenschläger
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany
| | - Fabian Prasser
- Institute of Medical Informatics, Statistics and Epidemiology, University Hospital rechts der Isar, Technical University of Munich, Munich, Germany.
| |
Collapse
|
9
|
Jonas S, Siewert S, Spreckelsen C. Privacy-Preserving Record Grouping and Consent Management Based on a Public-Private Key Signature Scheme: Theoretical Analysis and Feasibility Study. J Med Internet Res 2019; 21:e12300. [PMID: 30977738 PMCID: PMC6484261 DOI: 10.2196/12300] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 01/02/2019] [Accepted: 01/03/2019] [Indexed: 11/24/2022] Open
Abstract
Background Clinical and social trials create evidence that enables medical progress. However, the gathering of personal and patient data requires high security and privacy standards. Direct linking of personal information and medical data is commonly hidden through pseudonymization. While this makes unauthorized access to personal medical data more difficult, a centralized pseudonymization list can still pose a security risk. In addition, medical data linked via pseudonyms can still be used for data-driven reidentification. Objective Our objective was to propose a novel approach to pseudonymization based on public-private key cryptography that allows (1) decentralized patient-driven creation and maintenance of pseudonyms, (2) 1-time pseudonymization of each data record, and (3) grouping of patient data records even without knowing the pseudonymization key. Methods Based on public-private key cryptography, we set up a signing mechanism for patient data records and detailed the workflows for (1) user registration, (2) user log-in, (3) record storing, and (4) record grouping. We evaluated the proposed mechanism for performance, examined the potential risks based on cryptographic collision, and carried out a threat analysis. Results The performance analysis showed that all workflows could be performed with an average runtime of 0.057 to 42.320 ms (user registration), 0.083 to 0.606 ms (record creation), and 0.005 to 0.198 ms (record grouping) depending on the chosen cryptographic tools. We expected no realistic risk of cryptographic collision in the proposed system, and the threat analysis revealed that 3 distinct server systems of the proposed setup had to be compromised to allow access to combined medical data and private data. However, this would still allow only for data-driven deidentification. For a full reidentification, all 3 trial servers and all study participants would have to be compromised. In addition, the approach supports consent management, automatically anonymizes the data after trial closure, and provides basic mechanisms against data forging. Conclusions The proposed approach has a high security and privacy level in comparison with traditional centralized pseudonymization approaches and does not require a trusted third party. The only drawback in comparison with central pseudonymization is the directed feedback of accidental findings to individual participants, as this is not possible with a quasi-anonymous storage of patient data.
Collapse
Affiliation(s)
- Stephan Jonas
- Department of Informatics, Technical University of Munich, Garching, Germany
| | - Simon Siewert
- Department of Medical Informatics, Uniklinik Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| | - Cord Spreckelsen
- Department of Medical Informatics, Uniklinik Rheinisch-Westfälische Technische Hochschule Aachen, Aachen, Germany
| |
Collapse
|
10
|
Bruland P, Doods J, Brix T, Dugas M, Storck M. Connecting healthcare and clinical research: Workflow optimizations through seamless integration of EHR, pseudonymization services and EDC systems. Int J Med Inform 2018; 119:103-108. [PMID: 30342678 DOI: 10.1016/j.ijmedinf.2018.09.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 07/02/2018] [Accepted: 09/06/2018] [Indexed: 11/30/2022]
Abstract
OBJECTIVE In the last years, several projects promote the secondary use of routine healthcare data based on electronic health record (EHR) data. In multicenter studies, dedicated pseudonymization services are applied for unified pseudonym handling. Healthcare, clinical research and pseudonymization systems are generally disconnected. Hence, the aim of this research work is to integrate these applications and to evaluate the workflow of clinical research. METHODS We analyzed and identified technical solutions for legislation compliant automatic pseudonym generation and for the integration into EHR as well as electronic data capture (EDC) systems. The Mainzelliste was used as pseudonymization service, which is available as open source solution and compliant with the data privacy concept in Germany. Subject of the integration was the local EHR and an in-house developed EDC system. A time and motion study was conducted to evaluate the effects on the workflow. RESULTS Integration of EHR, pseudonymization service and EDC systems is technically feasible and leads to a less fragmented usage of all applications. Generated pseudonyms are obtained from the service hosted at a trusted third party and can now be used in the EDC as well as in the EHR system for direct access and re-identification. The evaluation of 90 registration iterations shows that the time for documentation has been significantly reduced in average by 39.6 s (56.3%) from 71 ± 8 s to 31 ± 5 s per registered study patient. CONCLUSIONS By incorporating EHR, EDC and pseudonymization systems, it is now feasible to support multicenter studies and registers out of an integrated system landscape within a hospital. Optimizing the workflow of patient registration for clinical research allows reduction of double data entry and transcription errors as well as a seamless transition from clinical routine to research data collection.
Collapse
Affiliation(s)
- Philipp Bruland
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| | - Justin Doods
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| | - Tobias Brix
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| | - Michael Storck
- Institute of Medical Informatics, University of Münster, Münster, Germany.
| |
Collapse
|
11
|
Jackson R, Kartoglu I, Stringer C, Gorrell G, Roberts A, Song X, Wu H, Agrawal A, Lui K, Groza T, Lewsley D, Northwood D, Folarin A, Stewart R, Dobson R. CogStack - experiences of deploying integrated information retrieval and extraction services in a large National Health Service Foundation Trust hospital. BMC Med Inform Decis Mak 2018; 18:47. [PMID: 29941004 PMCID: PMC6020175 DOI: 10.1186/s12911-018-0623-9] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 06/01/2018] [Indexed: 03/05/2023] Open
Abstract
BACKGROUND Traditional health information systems are generally devised to support clinical data collection at the point of care. However, as the significance of the modern information economy expands in scope and permeates the healthcare domain, there is an increasing urgency for healthcare organisations to offer information systems that address the expectations of clinicians, researchers and the business intelligence community alike. Amongst other emergent requirements, the principal unmet need might be defined as the 3R principle (right data, right place, right time) to address deficiencies in organisational data flow while retaining the strict information governance policies that apply within the UK National Health Service (NHS). Here, we describe our work on creating and deploying a low cost structured and unstructured information retrieval and extraction architecture within King's College Hospital, the management of governance concerns and the associated use cases and cost saving opportunities that such components present. RESULTS To date, our CogStack architecture has processed over 300 million lines of clinical data, making it available for internal service improvement projects at King's College London. On generated data designed to simulate real world clinical text, our de-identification algorithm achieved up to 94% precision and up to 96% recall. CONCLUSION We describe a toolkit which we feel is of huge value to the UK (and beyond) healthcare community. It is the only open source, easily deployable solution designed for the UK healthcare environment, in a landscape populated by expensive proprietary systems. Solutions such as these provide a crucial foundation for the genomic revolution in medicine.
Collapse
Affiliation(s)
- Richard Jackson
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ UK
| | - Ismail Kartoglu
- InterDigital Communications, 64 Great Eastern Street, 1st Floor, London, EC2A 3QR UK
| | - Clive Stringer
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | | | - Angus Roberts
- University of Sheffield, Western Bank, Sheffield, S10 2TN UK
| | - Xingyi Song
- University of Sheffield, Western Bank, Sheffield, S10 2TN UK
| | - Honghan Wu
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, EH16 4UX UK
| | - Asha Agrawal
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | - Kenneth Lui
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT UK
| | - Tudor Groza
- Garvan Institute of Medical Research, Sydney, NSW 2010 Australia
| | - Damian Lewsley
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | - Doug Northwood
- King’s College Hospital, Denmark Hill, London, SE5 9RS UK
| | - Amos Folarin
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT UK
| | - Robert Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- South London and Maudsley NHS Foundation Trust, Denmark Hill, London, SE5 8AZ UK
| | - Richard Dobson
- Institute of Psychiatry, Psychology and Neuroscience, King’s College London, 16 De Crespigne Park, London, SE5 8AF UK
- Farr Institute of Health Informatics Research, UCL Institute of Health Informatics, University College London, London, WC1E 6BT UK
| |
Collapse
|
12
|
Winter A, Takabayashi K, Jahn F, Kimura E, Engelbrecht R, Haux R, Honda M, Hübner UH, Inoue S, Kohl CD, Matsumoto T, Matsumura Y, Miyo K, Nakashima N, Prokosch HU, Staemmler M. Quality Requirements for Electronic Health Record Systems*. A Japanese-German Information Management Perspective. Methods Inf Med 2017; 56:e92-e104. [PMID: 28925415 PMCID: PMC6291988 DOI: 10.3414/me17-05-0002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2017] [Accepted: 06/13/2017] [Indexed: 12/16/2022]
Abstract
BACKGROUND For more than 30 years, there has been close cooperation between Japanese and German scientists with regard to information systems in health care. Collaboration has been formalized by an agreement between the respective scientific associations. Following this agreement, two joint workshops took place to explore the similarities and differences of electronic health record systems (EHRS) against the background of the two national healthcare systems that share many commonalities. OBJECTIVES To establish a framework and requirements for the quality of EHRS that may also serve as a basis for comparing different EHRS. METHODS Donabedian's three dimensions of quality of medical care were adapted to the outcome, process, and structural quality of EHRS and their management. These quality dimensions were proposed before the first workshop of EHRS experts and enriched during the discussions. RESULTS The Quality Requirements Framework of EHRS (QRF-EHRS) was defined and complemented by requirements for high quality EHRS. The framework integrates three quality dimensions (outcome, process, and structural quality), three layers of information systems (processes and data, applications, and physical tools) and three dimensions of information management (strategic, tactical, and operational information management). CONCLUSIONS Describing and comparing the quality of EHRS is in fact a multidimensional problem as given by the QRF-EHRS framework. This framework will be utilized to compare Japanese and German EHRS, notably those that were presented at the second workshop.
Collapse
Affiliation(s)
- Alfred Winter
- Prof. Alfred Winter, University of Leipzig, Institute for Medical Informatics, Statistics and Epidemiology, Haertelstr. 16 -18, 04107 Leipzig, Germany, E-mail:
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Lautenschläger R, Kohlmayer F, Prasser F, Kuhn KA. A generic solution for web-based management of pseudonymized data. BMC Med Inform Decis Mak 2015; 15:100. [PMID: 26621059 PMCID: PMC4665916 DOI: 10.1186/s12911-015-0222-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2015] [Accepted: 11/25/2015] [Indexed: 12/02/2022] Open
Abstract
BACKGROUND Collaborative collection and sharing of data have become a core element of biomedical research. Typical applications are multi-site registries which collect sensitive person-related data prospectively, often together with biospecimens. To secure these sensitive data, national and international data protection laws and regulations demand the separation of identifying data from biomedical data and to introduce pseudonyms. Neither the formulation in laws and regulations nor existing pseudonymization concepts, however, are precise enough to directly provide an implementation guideline. We therefore describe core requirements as well as implementation options for registries and study databases with sensitive biomedical data. METHODS We first analyze existing concepts and compile a set of fundamental requirements for pseudonymized data management. Then we derive a system architecture that fulfills these requirements. Next, we provide a comprehensive overview and a comparison of different technical options for an implementation. Finally, we develop a generic software solution for managing pseudonymized data and show its feasibility by describing how we have used it to realize two research networks. RESULTS We have found that pseudonymization models are highly heterogeneous, already on a conceptual level. We have compiled a set of requirements from different pseudonymization schemes. We propose an architecture and present an overview of technical options. Based on a selection of technical elements, we suggest a generic solution. It supports the multi-site collection and management of biomedical data. Security measures are multi-tier pseudonymity and physical separation of data over independent backend servers. Integrated views are provided by a web-based user interface. Our approach has been successfully used to implement a national and an international rare disease network. CONCLUSIONS We were able to identify a set of core requirements out of several pseudonymization models. Considering various implementation options, we realized a generic solution which was implemented and deployed in research networks. Still, further conceptual work on pseudonymity is needed. Specifically, it remains unclear how exactly data is to be separated into distributed subsets. Moreover, a thorough risk and threat analysis is needed.
Collapse
Affiliation(s)
- Ronald Lautenschläger
- Chair for Biomedical Informatics, Department of Medicine, Technical University of Munich (TUM), Grillparzerstraße 18, 81675 Munich, Germany
| | - Florian Kohlmayer
- Chair for Biomedical Informatics, Department of Medicine, Technical University of Munich (TUM), Grillparzerstraße 18, 81675 Munich, Germany
| | - Fabian Prasser
- Chair for Biomedical Informatics, Department of Medicine, Technical University of Munich (TUM), Grillparzerstraße 18, 81675 Munich, Germany
| | - Klaus A. Kuhn
- Chair for Biomedical Informatics, Department of Medicine, Technical University of Munich (TUM), Grillparzerstraße 18, 81675 Munich, Germany
| |
Collapse
|
14
|
Bickford J, Nisker J. Tensions between anonymity and thick description when "studying up" in genetics research. QUALITATIVE HEALTH RESEARCH 2015; 25:276-282. [PMID: 25239566 DOI: 10.1177/1049732314552194] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Anonymity, according to Tilley and Woodthorpe, refers to removing or obscuring participant information, whereas "confidentiality refers to the management of private information." Both are major considerations for ethics review boards, but can be challenges when "studying up" in qualitative research because of the depth, precision, and uniqueness of the information, and the prominence of research participants. In anthropology, providing detailed and nuanced accounts of particular spaces, events, and conditions is essential. Actions taken to hide or gloss over these particulars would impede the ability to demonstrate authenticity, validity, and verisimilitude. As social science moves into field sites such as cutting-edge genomics, where when studying up, participants through their particular contributions might be identified, strategies to decrease the friction between descriptive methodologies and the requirement for anonymity need to be developed. We conclude with recommendations for researchers and members of research ethics boards regarding how to anticipate and mitigate this tension.
Collapse
Affiliation(s)
| | - Jeff Nisker
- University of Western Ontario, London, Ontario, Canada
| |
Collapse
|