1
|
Bond A. Margin notes from the COVID-19 pandemic for the future of healthcare innovation. Healthc Manage Forum 2023; 36:393-398. [PMID: 37439203 PMCID: PMC10345824 DOI: 10.1177/08404704231185487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/14/2023]
Abstract
The COVID-19 pandemic has been characterized as a "big-event disruption" that fundamentally challenged the sustainability of existing healthcare business and service models and demanded innovation through "dual transformation" simultaneously to both core operations and the evolution of new strategic directions. The concept of disruptive innovation as applied to healthcare is reviewed and the strategies of distributed healthcare organizations supporting the most medically and socially complex communities during the COVID-19 pandemic are described as demonstrative of the promise of disruptive innovation in healthcare to bring about the necessary shift away from acute and facility-based care to integrated health and social care in the community. The place of new digital health technologies including "big data" analytics, digital platforms, and artificial intelligence/machine learning are identified as being integral to optimizing the scale and scope of impact of distributed community health and social care.
Collapse
Affiliation(s)
- Andrew Bond
- Inner City Health Associates, Toronto, Ontario, Canada
| |
Collapse
|
2
|
Reno JE, Ong TC, Voong C, Morse B, Ytell K, Koren R, Kwan BM. Engaging Patients and Other Stakeholders in "Designing for Dissemination" of Record Linkage Methods and Tools. Appl Clin Inform 2023; 14:670-683. [PMID: 37276886 PMCID: PMC10446912 DOI: 10.1055/a-2105-6505] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Accepted: 06/01/2023] [Indexed: 06/07/2023] Open
Abstract
BACKGROUND Novel record linkage (RL) methods have the potential to enhance clinical informatics by integrating patient data from multiple sources-including electronic health records, insurance claims, and digital health devices-to inform patient-centered care. Engaging patients and other stakeholders in the use of RL methods in patient-centered outcomes research (PCOR) is a key step in ensuring RL methods are viewed as acceptable, appropriate, and useful. The University of Colorado Record Linkage (CURL) platform empowers the use of RL in PCOR. OBJECTIVES This study aimed to describe the process of engaging patients and other stakeholders in the design of an RL dissemination package to support the use of RL methods in PCOR. METHODS Customer discovery, value proposition design, and user experience methods were used to iteratively develop an RL dissemination package that includes animated explainer videos for patients and an RL research planning workbook for researchers. Patients and other stakeholders (researchers, data managers, and regulatory officials) were engaged in the RL dissemination package design. RESULTS Patient partners emphasized the importance of conveying how RL methods may benefit patients and the rules researchers must follow to protect the privacy and security of patient data. Other stakeholders described accuracy, flexibility, efficiency, and data security compared with other available RL solutions. Dissemination package communication products reflect the value propositions identified by key stakeholders. As prioritized by patients, the animated explainer videos emphasize the data privacy and security processes and procedures employed when performing research using RL. The RL workbook addresses researchers' and data managers' needs to iteratively design RL projects and provides accompanying resources to alleviate leadership and regulatory officials' concerns about data regulation compliance. CONCLUSION Dissemination products to promote adoption and use of CURL include materials to facilitate patient engagement in RL research and investigator step-by-step decision-making materials about the integration of RL methods in PCOR.
Collapse
Affiliation(s)
- Jenna E. Reno
- RTI International, Center for Communication and Engagement Research, Research Triangle Park, North Carolina, United States
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
| | - Toan C. Ong
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
| | - Chan Voong
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
| | - Brad Morse
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
| | - Kate Ytell
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
| | - Ramona Koren
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
| | - Bethany M. Kwan
- School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
- Adult and Child Center for Health Outcomes Research and Delivery Science, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States
| |
Collapse
|
3
|
Marsolo K, Kiernan D, Toh S, Phua J, Louzao D, Haynes K, Weiner M, Angulo F, Bailey C, Bian J, Fort D, Grannis S, Krishnamurthy AK, Nair V, Rivera P, Silverstein J, Zirkle M, Carton T. Assessing the impact of privacy-preserving record linkage on record overlap and patient demographic and clinical characteristics in PCORnet®, the National Patient-Centered Clinical Research Network. J Am Med Inform Assoc 2022; 30:447-455. [PMID: 36451264 PMCID: PMC9933062 DOI: 10.1093/jamia/ocac229] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2022] [Revised: 11/03/2022] [Accepted: 11/16/2022] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE This article describes the implementation of a privacy-preserving record linkage (PPRL) solution across PCORnet®, the National Patient-Centered Clinical Research Network. MATERIAL AND METHODS Using a PPRL solution from Datavant, we quantified the degree of patient overlap across the network and report a de-duplicated analysis of the demographic and clinical characteristics of the PCORnet population. RESULTS There were ∼170M patient records across the responding Network Partners, with ∼138M (81%) of those corresponding to a unique patient. 82.1% of patients were found in a single partner and 14.7% were in 2. The percentage overlap between Partners ranged between 0% and 80% with a median of 0%. Linking patients' electronic health records with claims increased disease prevalence in every clinical characteristic, ranging between 63% and 173%. DISCUSSION The overlap between Partners was variable and depended on timeframe. However, patient data linkage changed the prevalence profile of the PCORnet patient population. CONCLUSIONS This project was one of the largest linkage efforts of its kind and demonstrates the potential value of record linkage. Linkage between Partners may be most useful in cases where there is geographic proximity between Partners, an expectation that potential linkage Partners will be able to fill gaps in data, or a longer study timeframe.
Collapse
Affiliation(s)
- Keith Marsolo
- Corresponding Author: Keith Marsolo, PhD, Department of Population Health Sciences, Duke University School of Medicine, 215 Morris Street, Durham, NC 27710, USA;
| | - Daniel Kiernan
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA
| | - Sengwee Toh
- Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, Massachusetts, USA
| | | | - Darcy Louzao
- Duke Clinical Research Institute, Duke University School of Medicine, Durham, North Carolina, USA
| | - Kevin Haynes
- Scientific Affairs, HealthCore, Inc., Wilmington, Delaware, USA
| | - Mark Weiner
- Department of Medicine, Weill Cornell Medicine, New York, New York, USA
| | - Francisco Angulo
- Department of Medicine, Cook County Health and Hospital System, Chicago, Illinois, USA
| | - Charles Bailey
- Department of Pediatrics, Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - Jiang Bian
- Department of Health Outcomes and Bioinformatics, College of Medicine, University of Florida, Gainesville, Florida, USA
| | - Daniel Fort
- Center for Outcomes and Health Services Research, Ochsner Health, New Orleans, Louisiana, USA
| | - Shaun Grannis
- Regenstrief Institute, Indiana University, Indianapolis, Indiana, USA
| | | | | | | | - Jonathan Silverstein
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | | | - Thomas Carton
- Louisiana Public Health Institute, New Orleans, Louisiana, USA
| |
Collapse
|
4
|
Kiernan D, Carton T, Toh S, Phua J, Zirkle M, Louzao D, Haynes K, Weiner M, Angulo F, Bailey C, Bian J, Fort D, Grannis S, Krishnamurthy AK, Nair V, Rivera P, Silverstein J, Marsolo K. Establishing a framework for privacy-preserving record linkage among electronic health record and administrative claims databases within PCORnet ®, the National Patient-Centered Clinical Research Network. BMC Res Notes 2022; 15:337. [PMID: 36316778 PMCID: PMC9620597 DOI: 10.1186/s13104-022-06243-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 10/21/2022] [Indexed: 12/02/2022] Open
Abstract
OBJECTIVE The aim of this study was to determine whether a secure, privacy-preserving record linkage (PPRL) methodology can be implemented in a scalable manner for use in a large national clinical research network. RESULTS We established the governance and technical capacity to support the use of PPRL across the National Patient-Centered Clinical Research Network (PCORnet®). As a pilot, four sites used the Datavant software to transform patient personally identifiable information (PII) into de-identified tokens. We queried the sites for patients with a clinical encounter in 2018 or 2019 and matched their tokens to determine whether overlap existed. We described patient overlap among the sites and generated a "deduplicated" table of patient demographic characteristics. Overlapping patients were found in 3 of the 6 site-pairs. Following deduplication, the total patient count was 3,108,515 (0.11% reduction), with the largest reduction in count for patients with an "Other/Missing" value for Sex; from 198 to 163 (17.6% reduction). The PPRL solution successfully links patients across data sources using distributed queries without directly accessing patient PII. The overlap queries and analysis performed in this pilot is being replicated across the full network to provide additional insight into patient linkages among a distributed research network.
Collapse
Affiliation(s)
- Daniel Kiernan
- grid.38142.3c000000041936754XDepartment of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA 02215 USA
| | - Thomas Carton
- grid.468191.30000 0004 0626 8374Louisiana Public Health Institute, New Orleans, LA 70112 USA
| | - Sengwee Toh
- grid.38142.3c000000041936754XDepartment of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA 02215 USA
| | | | - Maryan Zirkle
- grid.507100.30000 0004 6004 8305Cohen Veterans Bioscience, New York, NY 10018 USA
| | - Darcy Louzao
- grid.26009.3d0000 0004 1936 7961Duke Clinical Research Institute, Duke University School of Medicine, Durham, NC 27710 USA
| | - Kevin Haynes
- grid.467616.40000 0001 0698 1725Scientific Affairs, HealthCore, Inc., Wilmington, DE 19801 USA
| | - Mark Weiner
- grid.5386.8000000041936877XDepartment of Medicine, Weill Cornell Medicine, New York, NY 10021 USA
| | - Francisco Angulo
- grid.428291.4Department of Medicine, Cook County Health and Hospital System, Chicago, IL 60612 USA
| | - Charles Bailey
- grid.239552.a0000 0001 0680 8770Applied Clinical Research Center, Department of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, PA 19104 USA
| | - Jiang Bian
- grid.15276.370000 0004 1936 8091College of Medicine, University of Florida, Gainesville, FL 32610 USA
| | - Daniel Fort
- grid.416735.20000 0001 0229 4979Center for Outcomes and Health Services Research, Ochsner Health, New Orleans, LA 70121 USA
| | - Shaun Grannis
- grid.257413.60000 0001 2287 3919Regenstrief Institute, Indiana University, Indianapolis, IN 46202 USA
| | | | | | - Pedro Rivera
- grid.429963.30000 0004 0628 3400OCHIN, Inc., Portland, OR 97201 USA
| | - Jonathan Silverstein
- grid.21925.3d0000 0004 1936 9000Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206 USA
| | - Keith Marsolo
- grid.26009.3d0000 0004 1936 7961Duke Clinical Research Institute, Duke University School of Medicine, Durham, NC 27710 USA ,grid.26009.3d0000 0004 1936 7961Department of Population Health Sciences, Duke Clinical Research Institute, Duke University School of Medicine, Durham, NC 27710 USA
| |
Collapse
|
5
|
Morton Filter-Based Security Mechanism for Healthcare System in Cloud Computing. Healthcare (Basel) 2021; 9:healthcare9111551. [PMID: 34828597 PMCID: PMC8619796 DOI: 10.3390/healthcare9111551] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 11/05/2021] [Accepted: 11/10/2021] [Indexed: 12/28/2022] Open
Abstract
Electronic health records contain the patient’s sensitive information. If these data are acquired by a malicious user, it will not only cause the pilferage of the patient’s personal data but also affect the diagnosis and treatment. One of the most challenging tasks in cloud-based healthcare systems is to provide security and privacy to electronic health records. Various probabilistic data structures and watermarking techniques were used in the cloud-based healthcare systems to secure patient’s data. Most of the existing studies focus on cuckoo and bloom filters, without considering their throughputs. In this research, a novel cloud security mechanism is introduced, which supersedes the shortcomings of existing approaches. The proposed solution enhances security with methods such as fragile watermark, least significant bit replacement watermarking, class reliability factor, and Morton filters included in the formation of the security mechanism. A Morton filter is an approximate set membership data structure (ASMDS) that proves many improvements to other data structures, such as cuckoo, bloom, semi-sorting cuckoo, and rank and select quotient filters. The Morton filter improves security; it supports insertions, deletions, and lookups operations and improves their respective throughputs by 0.9× to 15.5×, 1.3× to 1.6×, and 1.3× to 2.5×, when compared to cuckoo filters. We used Hadoop version 0.20.3, and the platform was Red Hat Enterprise Linux 6; we executed five experiments, and the average of the results has been taken. The results of the simulation work show that our proposed security mechanism provides an effective solution for secure data storage in cloud-based healthcare systems, with a load factor of 0.9. Furthermore, to aid cloud security in healthcare systems, we presented the motivation, objectives, related works, major research gaps, and materials and methods; we, thus, presented and implemented a cloud security mechanism, in the form of an algorithm and a set of results and conclusions.
Collapse
|
6
|
Lacasse A, Gagnon V, Nguena Nguefack HL, Gosselin M, Pagé MG, Blais L, Guénette L. Chronic pain patients' willingness to share personal identifiers on the web for the linkage of medico-administrative claims and patient-reported data: The chronic pain treatment cohort. Pharmacoepidemiol Drug Saf 2021; 30:1012-1026. [PMID: 33901339 PMCID: PMC8360172 DOI: 10.1002/pds.5255] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 04/20/2021] [Indexed: 12/12/2022]
Abstract
PURPOSE The linkage between patient-reported data and medico-administrative claims is of great interest for epidemiologic research. The goal of this study was to assess the willingness of people living with chronic pain to share personal identifiers on the web for the linkage of medico-administrative and patient-reported data. METHODS This methodological investigation was achieved in the context of the implementation of the chronic pain treatment (COPE) cohort. A web-based recruitment initiative targeting adults living with chronic pain was conducted in the province of Quebec (Canada). RESULTS A total of 1935 participants completed the questionnaire (mean age: 49.86 ± 13.27; females: 83.69%), 921 (47.60%) of which agreed to data linkage and shared their personal identifiers (name, date of birth, health insurance number online). The most common reasons for refusal were: (1) concerns regarding data security/privacy (25.71%) and (2) the belief that the requested data were too personal/intrusive (13.52%). Some participants did not understand the relevance of data linkage (11.81%). Participants from the COPE cohort and those from the subsample who agreed to data linkage were comparable to other random samples of chronic pain individuals in terms of age and pain characteristics. CONCLUSIONS Although approximately half of the participants refused data linkage, our approach allowed for the implementation of a data platform that contains a diverse and substantial sample. This investigation has also led to the formulation of recommendations for web-based data linkage, including placing items designed to assess willingness to share personal identifiers at the end of the questionnaire, adding explanatory videos, and using a mixed-mode questionnaire.
Collapse
Affiliation(s)
- Anaïs Lacasse
- Département des sciences de la santéUniversité du Québec en Abitibi‐Témiscamingue (UQAT)Rouyn‐NorandaCanada
| | - Véronique Gagnon
- Département des sciences de la santéUniversité du Québec en Abitibi‐Témiscamingue (UQAT)Rouyn‐NorandaCanada
| | | | - Mélissa Gosselin
- Département des sciences de la santéUniversité du Québec en Abitibi‐Témiscamingue (UQAT)Rouyn‐NorandaCanada
| | - M. Gabrielle Pagé
- Centre de recherche du Centre hospitalier de l'Université de Montréal (CRCHUM)MontréalQuébecCanada
- Département d'anesthésiologie et de médecine de la douleur, Faculté de médecineUniversité de MontréalMontréalQuébecCanada
| | - Lucie Blais
- Faculté de pharmacieUniversité de MontréalMontréalCanada
| | - Line Guénette
- Faculté de pharmacieUniversité LavalQuébecQuébecCanada
- Centre de recherche du CHU de Québec – Université LavalQuébecQuébecCanada
| |
Collapse
|
7
|
Correll P, Feyer AM, Phan PT, Drake B, Jammal W, Irvine K, Power A, Muir S, Ferdousi S, Moubarak S, Oytam Y, Linden J, Fisher L. Lumos: a statewide linkage programme in Australia integrating general practice data to guide system redesign. INTEGRATED HEALTHCARE JOURNAL 2021. [DOI: 10.1136/ihj-2021-000074] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022] Open
Abstract
ObjectiveWith ageing of the Australian population, more people are living longer and experiencing chronic or complex health conditions. The challenge is to have information that supports the integration of services across the continuum of settings and providers, to deliver person-centred, seamless, efficient and effective healthcare. However, in Australia, data are typically siloed within health settings, precluding a comprehensive view of patient journeys. Here, we describe the establishment of the Lumos programme—the first statewide linked data asset across primary care and other settings in Australia and evaluate its representativeness to the census population.Methods and analysisRecords extracted from general practices throughout New South Wales (NSW), Australia’s most populous state, were linked to patient records from acute and other settings. Innovative privacy and security technologies were employed to facilitate ongoing and regular updates. The marginal demographic distributions of the Lumos cohort were compared with the NSW census population by calculating multiple measures of representation to evaluate its generalisability.ResultsThe first Lumos programme data extraction linked 1.3 million patients’ general practice records to other NSW health system data. This represented 16% of the NSW population. The demographic distribution of patients in Lumos was >95% aligned to that of the NSW population in the calculated measures of representativeness.ConclusionThe Lumos programme delivers an enduring, regularly updated data resource, providing unique insights about statewide, cross-setting healthcare utilisation. General practice patients represented in the Lumos data asset are representative of the NSW population overall. Lumos data can reliably be used to identify at-risk regions and groups, to guide the planning and design of health services and to monitor their impact throughout NSW.
Collapse
|
8
|
Boyd JH, Randall SM, Brown AP, Maller M, Botes D, Gillies M, Ferrante A. Population Data Centre Profiles: Centre for Data Linkage. Int J Popul Data Sci 2020; 4:1139. [PMID: 32935041 PMCID: PMC7473267 DOI: 10.23889/ijpds.v4i2.1139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
The Centre for Data Linkage (CDL) was established at Curtin University, Western Australia, to develop infrastructure to enable cross-jurisdictional record linkage in Australia. The CDL’s operating model makes use of the ‘separation principle’, with content data typically provided to researchers directly by the data custodian; jurisdictional linkage where available are used within the linkage process. Along with conducting record linkage, the team has also invested in establishing a research programme in record linkage methodology and in developing modern record linkage software which can handle the size and complexity of today’s workloads. The Centre has been instrumental in the development of practical methods for privacy-preserving record linkage, with this methodology now regularly used for real-world linkages. While the promise of a nation-wide linkage system in Australia has yet to be met, distributed models provide a potential solution.
Collapse
Affiliation(s)
- J H Boyd
- Centre for Data Linkage, School of Public Health, Curtin University.,Department of Public Health, School of Psychology and Public Health, College of Science, Health & Engineering, La Trobe University
| | - S M Randall
- Centre for Data Linkage, School of Public Health, Curtin University
| | - A P Brown
- Centre for Data Linkage, School of Public Health, Curtin University
| | - M Maller
- Centre for Data Linkage, School of Public Health, Curtin University
| | - D Botes
- Centre for Data Linkage, School of Public Health, Curtin University
| | - M Gillies
- Centre for Data Linkage, School of Public Health, Curtin University
| | - A Ferrante
- Centre for Data Linkage, School of Public Health, Curtin University
| |
Collapse
|
9
|
Rivera DR, Gokhale MN, Reynolds MW, Andrews EB, Chun D, Haynes K, Jonsson‐Funk ML, Lynch KE, Lund JL, Strongman H, Bhullar H, Raman SR. Linking electronic health data in pharmacoepidemiology: Appropriateness and feasibility. Pharmacoepidemiol Drug Saf 2020; 29:18-29. [DOI: 10.1002/pds.4918] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Revised: 08/23/2019] [Accepted: 10/16/2019] [Indexed: 11/06/2022]
Affiliation(s)
| | | | | | | | - Danielle Chun
- University of North Carolina Gillings School of Public Health Chapel Hill North Carolina
| | | | | | | | - Jennifer L. Lund
- University of North Carolina Gillings School of Public Health Chapel Hill North Carolina
| | | | | | | |
Collapse
|
10
|
Using Security Questions to Link Participants in Longitudinal Data Collection. PREVENTION SCIENCE : THE OFFICIAL JOURNAL OF THE SOCIETY FOR PREVENTION RESEARCH 2019; 21:194-202. [PMID: 31865542 DOI: 10.1007/s11121-019-01080-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Anonymous data collection systems are often necessary when assessing sensitive behaviors but can pose challenges to researchers seeking to link participants over time. To assist researchers in anonymously linking participants, we outlined and tested a novel security question linking (security question linking; SEEK) method. The SEEK method includes four steps: (1) data management and standardization, (2) many-to-many matching, (3) fuzzy matching, and (4) rematching and verification. The method is demonstrated in SAS with two samples from a longitudinal study of adolescent dating violence. After an initial assessment during a laboratory visit, participants were asked to complete an online assessment either (a) once, 3 months later (Sample 1, n = 60), or (b) three times at 1-month intervals (Sample 2, n = 140). Demographics, eye color, and responses to nine security questions were used as key variables to link responses from the laboratory and online follow-up assessments. The rates of matched cases were 100% in Sample 1 and from 94.3 to 98.3% in Sample 2. To quantify the confidence in the data quality of successfully matched pairs, we reported the means and standard deviations of the number of matched security questions. In addition, we reported the rank order and counts of the mismatched components in key variables. Results indicate that the SEEK method provides a feasible and reliable solution to link responses in longitudinal studies with sensitive questions.
Collapse
|
11
|
Jones KH, Ford DV, Thompson S, Lyons RA. A Profile of the SAIL Databank on the UK Secure Research Platform. Int J Popul Data Sci 2019; 4:1134. [PMID: 34095541 PMCID: PMC8142954 DOI: 10.23889/ijpds.v4i2.1134] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND The Secure Anonymised Information Linkage (SAIL) Databank is a national data safe haven of de identified datasets principally about the population of Wales, made available in anonymised form to researchers across the world. It was established to enable the vast arrays of data collected about individuals in the course of health and other public service delivery to be made available to answer important questions that could not otherwise be addressed without prohibitive effort. The SAIL Databank is the bedrock of other funded centres relying on the data for research. APPROACH SAIL is a data repository surrounded by a suite of physical, technical and procedural control measures embodying a proportionate privacy-by-design governance model, informed by public engagement, to safeguard the data and facilitate data utility. SAIL operates on the UK Secure Research Platform (SeRP), which is a customisable technology and analysis platform. Researchers access anonymised data via this secure research environment, from which results can be released following scrutiny for disclosure risk. SAIL data are being used in multiple research areas to evaluate the impact of health and social exposures and policy interventions. DISCUSSION Lessons learned and their applications include: managing evolving legislative and regulatory requirements; employing multiple, tiered security mechanisms; working hard to increase analytical capacity efficiency; and developing a multi-faceted programme of public engagement. Further work includes: incorporating new data types; enabling alternative means of data access; and developing further efficiencies across our operations. CONCLUSION SAIL represents an ongoing programme of work to develop and maintain an extensive, whole population data resource for research. Its privacy-by-design model and UK SeRP technology have received international acclaim, and we continually endeavour to demonstrate trustworthiness to support data provider assurance and public acceptability in data use. We strive for further improvement and continue a mutual learning process with our contemporaries in this rapidly developing field.
Collapse
Affiliation(s)
- KH Jones
- Population Data Science, Swansea University Medical School, Singleton Park, Swansea SA2 8PP
| | - DV Ford
- Population Data Science, Swansea University Medical School, Singleton Park, Swansea SA2 8PP
| | - S Thompson
- Population Data Science, Swansea University Medical School, Singleton Park, Swansea SA2 8PP
| | - RA Lyons
- Population Data Science, Swansea University Medical School, Singleton Park, Swansea SA2 8PP
| |
Collapse
|
12
|
Brown AP, Randall SM, Boyd JH, Ferrante AM. Evaluation of approximate comparison methods on Bloom filters for probabilistic linkage. Int J Popul Data Sci 2019; 4:1095. [PMID: 32935029 PMCID: PMC7482522 DOI: 10.23889/ijpds.v4i1.1095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
INTRODUCTION The need for increased privacy protection in data linkage has driven the development of privacy-preserving record linkage (PPRL) techniques. A popular technique using Bloom filters with cryptographic analyses, modifications, and hashing variations to optimise privacy has been the focus of much research in this area. With few applications of Bloom filters within a probabilistic framework, there is limited information on whether approximate matches between Bloom filtered fields can improve linkage quality. OBJECTIVES In this study, we evaluate the effectiveness of three approximate comparison methods for Bloom filters within the context of the Fellegi-Sunter model of recording linkage: Sørensen-Dice coefficient, Jaccard similarity and Hamming distance. METHODS Using synthetic datasets with introduced errors to simulate datasets with a range of data quality and a large real-world administrative health dataset, the research estimated partial weight curves for converting similarity scores (for each approximate comparison method) to partial weights at both field and dataset level. Deduplication linkages were run on each dataset using these partial weight curves. This was to compare the resulting quality of the approximate comparison techniques with linkages using simple cut-off similarity values and only exact matching. RESULTS Linkages using approximate comparisons produced significantly better quality results than those using exact comparisons only. Field level partial weight curves for a specific dataset produced the best quality results. The Sørensen-Dice coefficient and Jaccard similarity produced the most consistent results across a spectrum of synthetic and real-world datasets. CONCLUSION The use of Bloom filter similarity comparisons for probabilistic record linkage can produce linkage quality results which are comparable to Jaro-Winkler string similarities with unencrypted linkages. Probabilistic linkages using Bloom filters benefit significantly from the use of similarity comparisons, with partial weight curves producing the best results, even when not optimised for that particular dataset.
Collapse
Affiliation(s)
- AP Brown
- Centre for Data Linkage, Curtin University, Western Australia, Perth, Australia
| | - SM Randall
- Centre for Data Linkage, Curtin University, Western Australia, Perth, Australia
| | - JH Boyd
- Centre for Data Linkage, Curtin University, Western Australia, Perth, Australia
| | - AM Ferrante
- Centre for Data Linkage, Curtin University, Western Australia, Perth, Australia
| |
Collapse
|
13
|
Brown AP, Randall SM, Ferrante AM, Semmens JB, Boyd JH. Estimating parameters for probabilistic linkage of privacy-preserved datasets. BMC Med Res Methodol 2017; 17:95. [PMID: 28693507 PMCID: PMC5504757 DOI: 10.1186/s12874-017-0370-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2016] [Accepted: 06/23/2017] [Indexed: 08/23/2023] Open
Abstract
Background Probabilistic record linkage is a process used to bring together person-based records from within the same dataset (de-duplication) or from disparate datasets using pairwise comparisons and matching probabilities. The linkage strategy and associated match probabilities are often estimated through investigations into data quality and manual inspection. However, as privacy-preserved datasets comprise encrypted data, such methods are not possible. In this paper, we present a method for estimating the probabilities and threshold values for probabilistic privacy-preserved record linkage using Bloom filters. Methods Our method was tested through a simulation study using synthetic data, followed by an application using real-world administrative data. Synthetic datasets were generated with error rates from zero to 20% error. Our method was used to estimate parameters (probabilities and thresholds) for de-duplication linkages. Linkage quality was determined by F-measure. Each dataset was privacy-preserved using separate Bloom filters for each field. Match probabilities were estimated using the expectation-maximisation (EM) algorithm on the privacy-preserved data. Threshold cut-off values were determined by an extension to the EM algorithm allowing linkage quality to be estimated for each possible threshold. De-duplication linkages of each privacy-preserved dataset were performed using both estimated and calculated probabilities. Linkage quality using the F-measure at the estimated threshold values was also compared to the highest F-measure. Three large administrative datasets were used to demonstrate the applicability of the probability and threshold estimation technique on real-world data. Results Linkage of the synthetic datasets using the estimated probabilities produced an F-measure that was comparable to the F-measure using calculated probabilities, even with up to 20% error. Linkage of the administrative datasets using estimated probabilities produced an F-measure that was higher than the F-measure using calculated probabilities. Further, the threshold estimation yielded results for F-measure that were only slightly below the highest possible for those probabilities. Conclusions The method appears highly accurate across a spectrum of datasets with varying degrees of error. As there are few alternatives for parameter estimation, the approach is a major step towards providing a complete operational approach for probabilistic linkage of privacy-preserved datasets.
Collapse
Affiliation(s)
- Adrian P Brown
- Centre for Population Health Research, Curtin University, Kent Street, Bentley, Western Australia, 6102, Australia.
| | - Sean M Randall
- Centre for Population Health Research, Curtin University, Kent Street, Bentley, Western Australia, 6102, Australia
| | - Anna M Ferrante
- Centre for Population Health Research, Curtin University, Kent Street, Bentley, Western Australia, 6102, Australia
| | - James B Semmens
- Centre for Population Health Research, Curtin University, Kent Street, Bentley, Western Australia, 6102, Australia
| | - James H Boyd
- Centre for Population Health Research, Curtin University, Kent Street, Bentley, Western Australia, 6102, Australia
| |
Collapse
|